Complete Workflow for a Single Element

Prelude

from assyst.crystals import Formulas, sample_space_groups
from assyst.filters import DistanceFilter, AspectFilter, VolumeFilter
from assyst.relax import Relax, VolumeRelax, FullRelax, relax
from assyst.perturbations import RandomChoice, Rattle, Stretch, apply_perturbations
import numpy as np
import pandas as pd
from tqdm.auto import tqdm
import pickle
import matplotlib.pyplot as plt

Options

# maximum number of atoms to generate for the training data
# 2 atoms is very low, chosen here only to keep the example fast
# 10 is a usual value, leading to ~10k structures in the final training data set
max_num = 2
!mkdir -p Unary/{max_num}

Reference Potential

Simple example potential to have a fast reference to relax structures and generate training data.

image.png

from ase.calculators.morse import MorsePotential

The values are chosen here so that they roughly reproduce Copper’s lattice parameter.

reference = MorsePotential(epsilon=.3, r0=2.55265548*1.10619396, rho0=4)

Another good choice to play around with are the universal potentials based on the graph ACE models.

For this to work you’ll need to install the tensorpotential PyPI package.

from assyst.calculators import Grace
Grace?
Init signature: Grace(model: str = 'GRACE-FS-OAM') -> None
Docstring:     
Universal Graph Atomic Cluster Expansion models.

.. attention::
    This class needs additional dependencies!
    Install `tensorpotential` from `PyPI <https://pypi.org/project/tensorpotential/>`__.
File:           ~/science/phd/dev/assyst/assyst/calculators.py
Type:           ABCMeta
Subclasses:     

1. Sampling Random Strucxtures

The first step in ASSYST: generation of random structures while prescribing all possible bulk symmetries.

fs = Formulas.range('Cu', max_num + 1)
fs
Formulas(atoms=({'Cu': 0}, {'Cu': 1}, {'Cu': 2}))
spg = list(filter(
    # without any constraints pyxtal sometimes generates strange unit cells, 
    # filter out any that have c/a > 6 as an empirical threshold
    AspectFilter(6), 
    sample_space_groups(
        fs, 
        # generate all possible space groups numbered from 1 to 230
        spacegroups=range(1,230+1), 
        max_atoms=max_num
    )
))
len(spg)
167

Save

with open(f"Unary/{max_num}/spg.pckl", 'bw') as f:
    pickle.dump(spg, f)

2. Relaxing Configurations

ASSYST seeks to find the important pockets in a materials’ PES by applying a series of relaxation steps. They can be specified by Relax and its subclasses.

ax.Relax?
Object `ax.Relax` not found.

In the ASSYST paper we use a volume relaxation and a full internal and cell shape relaxation step that I call here VolMin and AllMin.

volset = VolumeRelax(max_steps=100, force_tolerance=1e-3)
allset = FullRelax(max_steps=100, force_tolerance=1e-3)

Some more experimental relaxation settings are defined in assyst.relax, e.g. options to relax only the cell shape or placing constraints on the symmetry.

In practice, these relaxations are carried out using DFT with low convergence settings. To speed up this example, I use here a Morse Potential calculator from ASE, but interfaces to production quality DFT codes exist as well and could be dropped in here.

%%time
volmin = list(relax(volset, reference, tqdm(spg)))
CPU times: user 2min 27s, sys: 134 ms, total: 2min 27s
Wall time: 49.5 s
allmin = list(relax(allset, reference, tqdm(volmin)))

Save

with open(f'Unary/{max_num}/volmin.pckl', 'wb') as f:
    pickle.dump(volmin, f)
with open(f'Unary/{max_num}/allmin.pckl', 'wb') as f:
    pickle.dump(allmin, f)

3. Random Perturbations

Final step in ASSYST: generating random configurations around the minima identified previously.

Apply the three types of random perturbations described in the ASSYST paper:

  1. mostly positional noise with some cell changes

  2. mostly hydrostatic cell changes

  3. mostly shear cell changes

The later are combined to favour hydrostatic changes a bit more with RandomChoice. The final list mods contains function-like objects that are applied to each fully minimized structure resulting from the previous steps. DistanceFilter ensures that no structures contain atoms that are too close as a result of these modification.

rattle = Rattle(.25) + Stretch(hydro=.05, shear=0.005)
hydro = Stretch(hydro=.80, shear=.05)
shear = Stretch(hydro=.05, shear=.20)
stretch = RandomChoice(hydro, shear, .7)
mods = 4*[rattle] + 4*[stretch]

The settings above reproduce the so far published training sets, but may be optimized or played with.

%%time
random = list(apply_perturbations(allmin, mods, filters=[DistanceFilter({'Cu': 1})]))
CPU times: user 1.17 s, sys: 3.96 ms, total: 1.18 s
Wall time: 1.14 s
len(random)
1069

Save

with open(f'Unary/{max_num}/random.pckl', 'wb') as f:
    pickle.dump(random, f)

4. Combine and Save Results

Load data from previous notebooks and evaluate cheap reference potential to construct a simple training set.

with open(f'Unary/{max_num}/spg.pckl', 'rb') as f:
    spg = pickle.load(f)
with open(f'Unary/{max_num}/volmin.pckl', 'rb') as f:
    volmin = pickle.load(f)
with open(f'Unary/{max_num}/allmin.pckl', 'rb') as f:
    allmin = pickle.load(f)
with open(f'Unary/{max_num}/random.pckl', 'rb') as f:
    random = pickle.load(f)

Final filter step, removing structures with very close atoms or very large volumes. In production runs, structures could also be filtered by forces or energy ranges.

everything = list(filter(VolumeFilter(300), filter(DistanceFilter({'Cu': 1}), spg + volmin + allmin + random)))
with open(f'Unary/{max_num}/everything.pckl', 'wb') as f:
    pickle.dump(everything, f)

Fit a simple potential

Technically not part of the ASSYST workflow anymore, but now we have everything to train a (simple) potential.

df = []
for s in tqdm(everything):
    s.calc = reference
    df.append({
        'ase_atoms': s,
        'energy': s.get_potential_energy(),
        'forces': s.get_forces(),
        'number_of_atoms': len(s)
    })
df = pd.DataFrame(df)
df.to_pickle(f'Unary/{max_num}/everything_training_data.pckl.gz')
df.eval('energy/number_of_atoms').plot.hist(bins=100, log=True)
plt.xlabel('Energy per atom [eV/atom]')
Text(0.5, 0, 'Energy per atom [eV/atom]')
../_images/ea290b2733aa8579778b2c607bc3988fffaac65b81d91fd01238160c395f26c1.png
plt.scatter(df.ase_atoms.map(lambda s: s.cell.volume/len(s)), df.energy/df.number_of_atoms, marker='.')
plt.xlim(0, 300)
(0.0, 300.0)
../_images/655552bd330bd8dd38c68c46154cb67663020aad4bf0783cd3d4653e8b672156.png
df.forces.explode().explode().infer_objects().plot.hist(bins=100, log=True)
plt.xlabel(r'force components [eV/$\mathrm{\AA}$]')
Text(0.5, 0, 'force components [eV/$\\mathrm{\\AA}$]')
../_images/636957bab5005f3162fc4f586288c6f2f575421e7b3fee4ba6c4b65ba4b272f5.png

Final Filtering Step

Because of the diverse and random nature of the structure in the training set, they can contain very extreme environments. Those create undue difficulty for learning the potential without beeing very informative, because, e.g. they contain atoms at very close distances. They can be filtered out either with EnergyFilter and ForceFilter when dealing with a list of Atoms, or directly on the pandas dataframe with the training data.

from assyst.filters import EnergyFilter, ForceFilter
EnergyFilter?
Init signature:
EnergyFilter(
    min_energy: float = -inf,
    max_energy: float = inf,
    *,
    missing: Literal['error', 'ignore'] = 'error',
) -> None
Docstring:      Filters structures by energy per atom.
File:           ~/science/phd/dev/assyst/assyst/filters.py
Type:           ABCMeta
Subclasses:     
ForceFilter?
Init signature:
ForceFilter(
    max_force: float = inf,
    *,
    missing: Literal['error', 'ignore'] = 'error',
) -> None
Docstring:      Filters structures by maximum force magnitude.
File:           ~/science/phd/dev/assyst/assyst/filters.py
Type:           ABCMeta
Subclasses:     
# filter(ForceFilter(10), ...)

These require ase SinglePointCalculators to be attached to the atoms.

Or working directly on pandas.

df.query('-3 <= energy / number_of_atoms < 10')
ase_atoms energy forces number_of_atoms NUMBER_OF_ATOMS
0 (Atom('Cu', [0.4904552729198183, 1.63243643194... -2.557480 [[1.54027859260511e-16, 8.817072997413392e-17,... 1 1
1 (Atom('Cu', [1.1427851095528256, 0.04530252914... -1.956908 [[-3.2959746043559335e-17, 4.137315490204685e-... 1 1
2 (Atom('Cu', [0.0, 0.0, 0.0], index=0)) -1.980389 [[9.280770596475918e-17, -1.249000902703301e-1... 1 1
3 (Atom('Cu', [1.2374903121082077, 1.23749031210... -1.740938 [[-1.3742262536253769e-17, -3.309527131512002e... 1 1
4 (Atom('Cu', [1.2952387291958223, 4.44089209850... -2.092009 [[3.361026734705064e-17, -1.0668549377257364e-... 1 1
... ... ... ... ... ...
1455 (Atom('Cu', [0.12337953810948867, -0.035488792... -5.226908 [[2.274477532016276, -0.3575283070841724, -2.0... 2 2
1456 (Atom('Cu', [0.1253714680968149, -0.0336145807... -5.134857 [[2.782709407024391, -0.23253255007378126, -2.... 2 2
1457 (Atom('Cu', [0.0, 0.0, 0.0], index=0), Atom('C... -5.142435 [[-2.624202938283915e-15, -1.366962099069724e-... 2 2
1458 (Atom('Cu', [0.0, 0.0, 0.0], index=0), Atom('C... -5.502145 [[-2.3635607360183997e-15, -2.0929438737660178... 2 2
1459 (Atom('Cu', [0.0, 0.0, 0.0], index=0), Atom('C... -4.048501 [[-6.816731424122424e-15, -6.053100728986571e-... 2 2

1460 rows × 5 columns

ACE

Fit a simplified linear, pair descriptor only ACE potential to the training data.

You’ll need to install python-ace from either PyPI or conda-forge.

import pyace.linearacefit as plf
import pyace
def make_ace(rmax, number_of_radial_functions, element='Cu'):
    '''Creates a simple basis configuration for a pair ACE.'''
    pot_conf = {
        'elements': [element],
        'embeddings': {
            'ALL': {
                'fs_parameters': [1, 1],
                'ndensity': 1,
                'npot': 'FinnisSinclair',
            },
        },
        'bonds': {
            'ALL': {
                'NameOfCutoffFunction': 'cos',
                'dcut': 0.01,
                'radbase': 'ChebPow',
                'radparameters': [2.0],
                'rcut': rmax
            },
        },
        'functions': {
            'UNARY': { 
                'nradmax_by_orders': [ number_of_radial_functions ],
                'lmax_by_orders':    [  0 ] 
            }
        }
    }
    return pyace.create_multispecies_basis_config(pot_conf)
/home/ponder/micromamba/envs/assyst/lib/python3.11/site-packages/pyace/multispecies_basisextension.py:4: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources
from ase import Atoms
def dimer(calc, r):
    '''Helper function to evaluate a dimer curve for a given calculator.

    Args:
        calc (ase.calculator.Calculator): the potential to evaluate
        r (iterable of float): distances to evaluate the dimer curve at

    Returns:
        array of distances as given, array of corresponding energies.'''
    s = lambda r: Atoms(['Cu','Cu'], positions=[[0.]*3, [r, 0, 0]], cell=[40]*3)
    e = []
    r = np.array(r)
    for ri in r:
        si = s(ri)
        si.calc = calc
        e.append(si.get_potential_energy())
    return r, np.array(e)
bbasis = make_ace(6.5, number_of_radial_functions=20, element='Cu')
ds = plf.LinearACEDataset(bbasis, df)
%%time
ds.construct_design_matrix()
CPU times: user 6.69 ms, sys: 61.1 ms, total: 67.8 ms
Wall time: 898 ms
lf = plf.LinearACEFit(train_dataset=ds)
%%time
lf.fit()
CPU times: user 4.96 ms, sys: 1.98 ms, total: 6.94 ms
Wall time: 11.4 ms
Ridge(alpha=1e-05, copy_X=False, fit_intercept=False, random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
acef = pyace.PyACECalculator(lf.get_bbasis())

Basic error metrics.

lf.compute_errors()
{'epa_mae': 0.0031517361621489818,
 'epa_rmse': 0.00443638735345121,
 'f_comp_mae': 0.0015019558016016387,
 'f_comp_rmse': 0.0033168030199981578}

Compare the resulting pair potential.

r = np.linspace(2, 7)
plt.subplot(121)
plt.plot(*dimer(acef, r),  label='ACE')
plt.plot(*dimer(reference, r), label='REF')
plt.legend()
plt.xlabel(r'$r$ [$\mathrm{\AA}$]')
plt.ylabel(r'$E$ [eV/atom]')
plt.subplot(122)
plt.plot(r, dimer(acef, r)[1]-dimer(reference, r)[1])
plt.xlabel(r'$r$ [$\mathrm{\AA}$]')
plt.ylabel(r'$\Delta E$ [eV/atom]')
plt.tight_layout()
../_images/9fedfb91ee254e6c6ffd86dfedbfe8eb309b7b3cc642703fea7a59b06f5db735.png