A Complete Workflow for a Single Element¶
This notebook runs the full ASSYST pipeline end-to-end on a unary system (copper). The four canonical steps are:
Sampling — generate symmetric random structures for every space group.
Relaxing — minimise volume first, then full cell + positions, to find the relevant pockets of the potential energy surface.
Perturbing — apply random rattles and stretches around those minima to populate the surrounding PES.
Combining — collect everything, filter unphysical configurations, evaluate the reference and store the resulting training data.
In a real workflow the relaxations and final evaluations are carried out with DFT (or any other reference); here we use a Morse potential so the notebook runs quickly.
Imports¶
import pickle
from pathlib import Path
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tqdm.auto import tqdm
from assyst.crystals import Formulas, sample
from assyst.filters import DistanceFilter, AspectFilter, VolumeFilter
from assyst.relaxations import VolumeRelax, FullRelax, relax
from assyst.perturbations import RandomChoice, Rattle, Stretch, perturb
Configuration¶
max_num is the largest number of atoms per generated structure. 2 keeps
the example fast; 10 is the typical production value and yields ~10k
training structures.
max_num = 2
out_dir = Path(f"Unary/{max_num}")
out_dir.mkdir(parents=True, exist_ok=True)
Reference potential¶
Any ASE calculator works. We pick a Morse potential whose parameters were chosen so that its minimum sits near the experimental Cu-Cu bond length.
For more interesting reference data the universal Grace graph-ACE models
in assyst.calculators are a good drop-in replacement (they require the
optional tensorpotential dependency).
from ase.calculators.morse import MorsePotential
reference = MorsePotential(epsilon=0.3, r0=2.55265548 * 1.10619396, rho0=4)
1. Sampling random structures¶
Build the formulas to sample (Cu, Cu_2, …, up to max_num) and let
sample generate one structure for every compatible space group.
AspectFilter(6) filters out very elongated unit cells that pyxtal
occasionally produces.
formulas = Formulas.range('Cu', max_num + 1)
formulas
Formulas(atoms=({'Cu': 0}, {'Cu': 1}, {'Cu': 2}))
spg = list(filter(
AspectFilter(6),
sample(
formulas,
spacegroups=range(1, 230 + 1),
max_atoms=max_num,
)
))
print(f"Generated {len(spg)} symmetric seeds")
Generated 168 symmetric seeds
Just to create a small check point, dump generated structures into a pickle file. This is not required for assyst.
with open(out_dir / 'spg.pckl', 'wb') as f:
pickle.dump(spg, f)
2. Relaxation¶
ASSYST applies a two-step relaxation:
VolumeRelax— relax the volume only, keeping shape and internal positions frozen. Quickly puts each seed near a sensible energy scale.FullRelax— relax cell shape, volume and atomic positions together, landing in a (local) minimum of the PES.
In production both would be carried out with DFT; here the Morse potential
makes the cost negligible. For experimentation, assyst.relaxations also
provides CellRelax and SymmetryRelax.
volset = VolumeRelax(max_steps=100, force_tolerance=1e-3)
allset = FullRelax(max_steps=100, force_tolerance=1e-3)
%%time
volmin = list(relax(tqdm(spg, desc='volume'), volset, reference))
CPU times: user 2min 24s, sys: 139 ms, total: 2min 25s
Wall time: 48.7 s
%%time
allmin = list(relax(tqdm(volmin, desc='full'), allset, reference))
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:624: RuntimeWarning: logm result may be inaccurate, approximate err = 2.432556521112339e-13
cur_deform_grad_log = self.logm(cur_deform_grad)
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:606: RuntimeWarning: logm result may be inaccurate, approximate err = 2.432556521112339e-13
pos[natoms:] = self.logm(pos[natoms:]) * self.exp_cell_factor
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:624: RuntimeWarning: logm result may be inaccurate, approximate err = 3.7157276183604643e-13
cur_deform_grad_log = self.logm(cur_deform_grad)
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:606: RuntimeWarning: logm result may be inaccurate, approximate err = 3.7157276183604643e-13
pos[natoms:] = self.logm(pos[natoms:]) * self.exp_cell_factor
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:624: RuntimeWarning: logm result may be inaccurate, approximate err = 5.516864777792211e-13
cur_deform_grad_log = self.logm(cur_deform_grad)
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:606: RuntimeWarning: logm result may be inaccurate, approximate err = 5.516864777792211e-13
pos[natoms:] = self.logm(pos[natoms:]) * self.exp_cell_factor
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:624: RuntimeWarning: logm result may be inaccurate, approximate err = 8.15181096918948e-13
cur_deform_grad_log = self.logm(cur_deform_grad)
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:606: RuntimeWarning: logm result may be inaccurate, approximate err = 8.15181096918948e-13
pos[natoms:] = self.logm(pos[natoms:]) * self.exp_cell_factor
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:624: RuntimeWarning: logm result may be inaccurate, approximate err = 2.3749153830356407e-13
cur_deform_grad_log = self.logm(cur_deform_grad)
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:606: RuntimeWarning: logm result may be inaccurate, approximate err = 2.3749153830356407e-13
pos[natoms:] = self.logm(pos[natoms:]) * self.exp_cell_factor
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:624: RuntimeWarning: logm result may be inaccurate, approximate err = 2.4866777525490544e-13
cur_deform_grad_log = self.logm(cur_deform_grad)
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:606: RuntimeWarning: logm result may be inaccurate, approximate err = 2.4866777525490544e-13
pos[natoms:] = self.logm(pos[natoms:]) * self.exp_cell_factor
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:624: RuntimeWarning: logm result may be inaccurate, approximate err = 2.421109299871798e-13
cur_deform_grad_log = self.logm(cur_deform_grad)
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:606: RuntimeWarning: logm result may be inaccurate, approximate err = 2.421109299871798e-13
pos[natoms:] = self.logm(pos[natoms:]) * self.exp_cell_factor
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:624: RuntimeWarning: logm result may be inaccurate, approximate err = 2.452048037342556e-13
cur_deform_grad_log = self.logm(cur_deform_grad)
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:606: RuntimeWarning: logm result may be inaccurate, approximate err = 2.452048037342556e-13
pos[natoms:] = self.logm(pos[natoms:]) * self.exp_cell_factor
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:624: RuntimeWarning: logm result may be inaccurate, approximate err = 2.9627799244203634e-13
cur_deform_grad_log = self.logm(cur_deform_grad)
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:606: RuntimeWarning: logm result may be inaccurate, approximate err = 2.9627799244203634e-13
pos[natoms:] = self.logm(pos[natoms:]) * self.exp_cell_factor
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:624: RuntimeWarning: logm result may be inaccurate, approximate err = 2.2935924938929445e-13
cur_deform_grad_log = self.logm(cur_deform_grad)
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:606: RuntimeWarning: logm result may be inaccurate, approximate err = 2.2935924938929445e-13
pos[natoms:] = self.logm(pos[natoms:]) * self.exp_cell_factor
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:624: RuntimeWarning: logm result may be inaccurate, approximate err = 2.4872886565241954e-13
cur_deform_grad_log = self.logm(cur_deform_grad)
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:606: RuntimeWarning: logm result may be inaccurate, approximate err = 2.4872886565241954e-13
pos[natoms:] = self.logm(pos[natoms:]) * self.exp_cell_factor
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:624: RuntimeWarning: logm result may be inaccurate, approximate err = 3.021149907848636e-13
cur_deform_grad_log = self.logm(cur_deform_grad)
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:606: RuntimeWarning: logm result may be inaccurate, approximate err = 3.021149907848636e-13
pos[natoms:] = self.logm(pos[natoms:]) * self.exp_cell_factor
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:624: RuntimeWarning: logm result may be inaccurate, approximate err = 2.37529756320406e-13
cur_deform_grad_log = self.logm(cur_deform_grad)
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:606: RuntimeWarning: logm result may be inaccurate, approximate err = 2.37529756320406e-13
pos[natoms:] = self.logm(pos[natoms:]) * self.exp_cell_factor
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:624: RuntimeWarning: logm result may be inaccurate, approximate err = 2.9385885786932446e-13
cur_deform_grad_log = self.logm(cur_deform_grad)
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:606: RuntimeWarning: logm result may be inaccurate, approximate err = 2.9385885786932446e-13
pos[natoms:] = self.logm(pos[natoms:]) * self.exp_cell_factor
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:624: RuntimeWarning: logm result may be inaccurate, approximate err = 3.6379884134723173e-13
cur_deform_grad_log = self.logm(cur_deform_grad)
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:606: RuntimeWarning: logm result may be inaccurate, approximate err = 3.6379884134723173e-13
pos[natoms:] = self.logm(pos[natoms:]) * self.exp_cell_factor
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:624: RuntimeWarning: logm result may be inaccurate, approximate err = 2.989922080203508e-13
cur_deform_grad_log = self.logm(cur_deform_grad)
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:606: RuntimeWarning: logm result may be inaccurate, approximate err = 2.989922080203508e-13
pos[natoms:] = self.logm(pos[natoms:]) * self.exp_cell_factor
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:624: RuntimeWarning: logm result may be inaccurate, approximate err = 2.3682633429615987e-13
cur_deform_grad_log = self.logm(cur_deform_grad)
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:606: RuntimeWarning: logm result may be inaccurate, approximate err = 2.3682633429615987e-13
pos[natoms:] = self.logm(pos[natoms:]) * self.exp_cell_factor
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:624: RuntimeWarning: logm result may be inaccurate, approximate err = 2.89860856904846e-13
cur_deform_grad_log = self.logm(cur_deform_grad)
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:606: RuntimeWarning: logm result may be inaccurate, approximate err = 2.89860856904846e-13
pos[natoms:] = self.logm(pos[natoms:]) * self.exp_cell_factor
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:624: RuntimeWarning: logm result may be inaccurate, approximate err = 3.8198492634984746e-13
cur_deform_grad_log = self.logm(cur_deform_grad)
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:624: RuntimeWarning: logm result may be inaccurate, approximate err = 2.535116799634508e-13
cur_deform_grad_log = self.logm(cur_deform_grad)
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:606: RuntimeWarning: logm result may be inaccurate, approximate err = 2.535116799634508e-13
pos[natoms:] = self.logm(pos[natoms:]) * self.exp_cell_factor
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:624: RuntimeWarning: logm result may be inaccurate, approximate err = 3.7910927379480105e-13
cur_deform_grad_log = self.logm(cur_deform_grad)
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:606: RuntimeWarning: logm result may be inaccurate, approximate err = 3.7910927379480105e-13
pos[natoms:] = self.logm(pos[natoms:]) * self.exp_cell_factor
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:624: RuntimeWarning: logm result may be inaccurate, approximate err = 5.716936716033083e-13
cur_deform_grad_log = self.logm(cur_deform_grad)
/home/ponder/micromamba/envs/assyst/lib/python3.12/site-packages/ase/filters.py:606: RuntimeWarning: logm result may be inaccurate, approximate err = 5.716936716033083e-13
pos[natoms:] = self.logm(pos[natoms:]) * self.exp_cell_factor
CPU times: user 9min 35s, sys: 234 ms, total: 9min 35s
Wall time: 3min 13s
with open(out_dir / 'volmin.pckl', 'wb') as f:
pickle.dump(volmin, f)
with open(out_dir / 'allmin.pckl', 'wb') as f:
pickle.dump(allmin, f)
3. Random perturbations¶
Around each minimum we sample the surrounding PES with three families of random perturbations described in the ASSYST paper:
mostly positional noise with a small cell change (
rattle),mostly hydrostatic cell change,
mostly shear cell change.
The hydrostatic and shear strains are combined with RandomChoice so that
hydrostatic changes are drawn slightly more often. The final mods
sequence is applied to every fully-minimised structure. DistanceFilter
discards perturbations that placed atoms unphysically close.
rattle = Rattle(0.25) + Stretch(hydro=0.05, shear=0.005)
hydro = Stretch(hydro=0.80, shear=0.05)
shear = Stretch(hydro=0.05, shear=0.20)
stretch = RandomChoice(hydro, shear, 0.7)
# four rattles + four stretches per minimum reproduces the published settings
mods = 4 * [rattle] + 4 * [stretch]
%%time
random = list(perturb(allmin, mods, filters=[DistanceFilter({'Cu': 1})]))
print(f"Produced {len(random)} perturbed structures")
Produced 1196 perturbed structures
CPU times: user 1.13 s, sys: 4.92 ms, total: 1.13 s
Wall time: 1.08 s
with open(out_dir / 'random.pckl', 'wb') as f:
pickle.dump(random, f)
4. Combine and evaluate¶
Concatenate the sampled, relaxed and perturbed structures, run the distance and volume filters once more to reject any leftover pathological configurations, and finally evaluate the reference to obtain a training dataset of energies and forces.
everything = list(filter(
VolumeFilter(300),
filter(DistanceFilter({'Cu': 1}), spg + volmin + allmin + random),
))
print(f"Final training set: {len(everything)} structures")
Final training set: 1573 structures
rows = []
for s in tqdm(everything, desc='evaluating'):
s.calc = reference
rows.append({
'ase_atoms': s,
'energy': s.get_potential_energy(),
'forces': s.get_forces(),
'number_of_atoms': len(s),
})
df = pd.DataFrame(rows)
df.to_pickle(out_dir / 'everything_training_data.pckl.gz')
df.head()
| ase_atoms | energy | forces | number_of_atoms | |
|---|---|---|---|---|
| 0 | (Atom('Cu', [-1.3265240321355765, -7.353189190... | -1.802887 | [[6.071532165918825e-18, -7.979727989493313e-1... | 1 |
| 1 | (Atom('Cu', [0.0, 0.0, 0.24375080364530244], i... | -2.192044 | [[7.556889142223966e-17, 2.886146183156413e-16... | 1 |
| 2 | (Atom('Cu', [0.0, 0.0, 0.0], index=0)) | -1.702372 | [[4.5102810375396984e-17, 1.708702623837155e-1... | 1 |
| 3 | (Atom('Cu', [-1.1459510315187336, -1.145951031... | -1.921211 | [[6.895525817007808e-17, -3.673276960380889e-1... | 1 |
| 4 | (Atom('Cu', [-0.7029469904758565, 1.2175399025... | -2.453490 | [[5.204170427930421e-18, 7.28583859910259e-17,... | 1 |
Inspect the training data¶
Energy per atom, energy vs volume and force component distributions are a quick sanity check that the dataset spans the expected ranges.
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
(df['energy'] / df['number_of_atoms']).plot.hist(bins=100, log=True, ax=axes[0])
axes[0].set_xlabel('Energy per atom [eV/atom]')
axes[0].set_ylabel('# Structures')
axes[1].scatter(
df['ase_atoms'].map(lambda s: s.cell.volume / len(s)),
df['energy'] / df['number_of_atoms'],
marker='.', alpha=0.5,
)
axes[1].set_xlim(0, 300)
axes[1].set_xlabel(r'Volume [$\mathrm{\AA}^3$/atom]')
axes[1].set_ylabel('Energy per atom [eV/atom]')
df['forces'].explode().explode().infer_objects().plot.hist(bins=100, log=True, ax=axes[2])
axes[2].set_xlabel(r'Force component [eV/$\mathrm{\AA}$]')
axes[2].set_ylabel('# Components')
plt.tight_layout()
Optional: post-hoc filtering¶
If the resulting energies and forces still cover ranges that are too
extreme, EnergyFilter and ForceFilter (or pandas query) can prune
them. These act on ase.Atoms objects that carry a
SinglePointCalculator, which is what relax and the loop above
attach.
df.query('-3 <= energy / number_of_atoms < 10')
| ase_atoms | energy | forces | number_of_atoms | |
|---|---|---|---|---|
| 0 | (Atom('Cu', [-1.3265240321355765, -7.353189190... | -1.802887 | [[6.071532165918825e-18, -7.979727989493313e-1... | 1 |
| 1 | (Atom('Cu', [0.0, 0.0, 0.24375080364530244], i... | -2.192044 | [[7.556889142223966e-17, 2.886146183156413e-16... | 1 |
| 2 | (Atom('Cu', [0.0, 0.0, 0.0], index=0)) | -1.702372 | [[4.5102810375396984e-17, 1.708702623837155e-1... | 1 |
| 3 | (Atom('Cu', [-1.1459510315187336, -1.145951031... | -1.921211 | [[6.895525817007808e-17, -3.673276960380889e-1... | 1 |
| 4 | (Atom('Cu', [-0.7029469904758565, 1.2175399025... | -2.453490 | [[5.204170427930421e-18, 7.28583859910259e-17,... | 1 |
| ... | ... | ... | ... | ... |
| 1568 | (Atom('Cu', [0.07877639345419643, 0.3276080267... | -5.287408 | [[-2.5511153646601943, 0.8436956669030645, -1.... | 2 |
| 1569 | (Atom('Cu', [0.0, 0.0, 0.0], index=0), Atom('C... | -4.389131 | [[-5.963111948670274e-17, -3.4876615484513707e... | 2 |
| 1570 | (Atom('Cu', [0.0, 0.0, 0.0], index=0), Atom('C... | -4.417568 | [[-2.8027710360922775e-15, -1.7008963681952594... | 2 |
| 1571 | (Atom('Cu', [0.0, 0.0, 0.0], index=0), Atom('C... | -5.346236 | [[-2.4824977143400595e-15, -1.9329156331071573... | 2 |
| 1572 | (Atom('Cu', [0.0, 0.0, 0.0], index=0), Atom('C... | -4.695105 | [[-2.8142381681322062e-15, 3.803390538441569e-... | 2 |
1573 rows × 4 columns
Next steps¶
This notebook stops at the training set. Fitting a potential is then the
job of a downstream MLIP package — e.g. ACE via python-ace, MTP via
mlip-3, or any other framework that reads a list of structures with
energies and forces. The dataframe saved in
Unary/<max_num>/everything_training_data.pckl.gz is in the format
consumed by pyace.linearacefit.LinearACEDataset.
The companion notebook ACEFitting.ipynb walks through training a small linear ACE potential on the dataset produced here.