Perturbations¶
After relaxation, ASSYST perturbs structures to populate the training set with off-equilibrium configurations. This notebook walks through the perturbations available in :mod:assyst.perturbations:
:class:
~assyst.perturbations.Rattle– gaussian displacements of atomic positions:class:
~assyst.perturbations.ElementScaledRattle– gaussian displacements with per-element scaling:class:
~assyst.perturbations.Stretch– random affine cell deformation:class:
~assyst.perturbations.Series– chain perturbations together:class:
~assyst.perturbations.RandomChoice– pick between two perturbations at random
Each can be applied directly to a single structure or via :func:~assyst.perturbations.perturb to a stream of structures, optionally filtered.
from assyst.perturbations import (
Rattle,
ElementScaledRattle,
Stretch,
Series,
RandomChoice,
perturb,
)
from assyst.filters import DistanceFilter
from ase.build import bulk
import numpy as np
Reference structure¶
We use a 2x2x2 Cu super cell as the starting point. All perturbations operate in place on a copy of the input, so we always pass a fresh copy when comparing results.
ref = bulk('Cu', 'fcc', a=3.6, cubic=True).repeat(2)
ref.info['source'] = 'reference'
print(f'natoms = {len(ref)}')
print(f'volume = {ref.get_volume():.3f} A^3')
natoms = 32
volume = 373.248 A^3
Rattle¶
Displace each atom by a gaussian-random vector with standard deviation sigma (in A). Use this to sample short-wavelength phonon-like distortions.
The rng argument accepts either an integer seed or a :class:numpy.random.Generator; passing a seed makes the perturbation reproducible.
Rattling a structure with a single atom is meaningless because of the periodic boundary, so the underlying :func:~assyst.perturbations.rattle raises ValueError. Set create_supercells=True to automatically replicate to 2x2x2 first.
for sigma in (0.02, 0.1, 0.2):
out = Rattle(sigma=sigma, rng=0)(ref.copy())
disp = np.linalg.norm(out.positions - ref.positions, axis=1)
print(f'sigma={sigma:.2f}: mean |dx|={disp.mean():.3f} A, max |dx|={disp.max():.3f} A')
print(f' info[perturbation] = {out.info["perturbation"]!r}')
sigma=0.02: mean |dx|=0.031 A, max |dx|=0.053 A
info[perturbation] = 'rattle(0.02)'
sigma=0.10: mean |dx|=0.157 A, max |dx|=0.265 A
info[perturbation] = 'rattle(0.1)'
sigma=0.20: mean |dx|=0.313 A, max |dx|=0.529 A
info[perturbation] = 'rattle(0.2)'
ElementScaledRattle¶
Like Rattle, but the displacement standard deviation is given as a relative sigma and multiplied by a per-element reference length. This is useful for multi-component systems where different elements have different natural bond lengths – you want the perturbations to scale accordingly rather than apply the same absolute noise everywhere.
All elements present in the structure must have an entry in reference.
ref_lengths = {'Cu': 2.55} # near-neighbor distance in fcc Cu
out = ElementScaledRattle(sigma=0.05, reference=ref_lengths, rng=0)(ref.copy())
disp = np.linalg.norm(out.positions - ref.positions, axis=1)
# effective per-axis stdev is sigma * reference, so mean |dx| ~ sigma * reference * sqrt(8/pi)
print(f'mean |dx| = {disp.mean():.3f} A (per-axis stdev = {0.05 * 2.55:.3f} A)')
mean |dx| = 0.200 A (per-axis stdev = 0.128 A)
Stretch¶
Apply a random affine deformation to the cell, scaling atomic positions with it.
hydro– maximum diagonal (hydrostatic + uniaxial) strain magnitudeshear– maximum off-diagonal (shear) strain magnitudeminimum_strain– a floor on the magnitude of each strain component, ensuring the result is meaningfully different from the input. This avoids near-identity strains that confuse symmetry analyzers (e.g. VASP’s).
for hydro, shear in [(0.02, 0.005), (0.005, 0.05), (0.05, 0.05)]:
out = Stretch(hydro=hydro, shear=shear, rng=0)(ref.copy())
print(f'hydro={hydro}, shear={shear}: dV/V = {(out.get_volume() / ref.get_volume() - 1) * 100:+.2f}%')
print(f' cell =\n{out.cell.array}')
hydro=0.02, shear=0.005: dV/V = +1.35%
cell =
[[7.10981223 0.00838004 0.007676 ]
[0.00838004 7.30699513 0.03062218]
[0.007676 0.03062218 7.2815679 ]]
hydro=0.005, shear=0.05: dV/V = +0.20%
cell =
[[7.17532889 0.02165546 0.01303095]
[0.02165546 7.2282095 0.29412174]
[0.01303095 0.29412174 7.2228564 ]]
hydro=0.05, shear=0.05: dV/V = +3.10%
cell =
[[6.9787789 0.02165546 0.01303095]
[0.02165546 7.46456639 0.29412174]
[0.01303095 0.29412174 7.3989909 ]]
Composing perturbations¶
Perturbations support + to chain into a :class:~assyst.perturbations.Series that applies each step in turn. The perturbation tag in info["perturbation"] is built up accordingly so you can later see exactly what was done.
combined = Stretch(hydro=0.03, shear=0.02, rng=0) + Rattle(sigma=0.05, rng=0)
out = combined(ref.copy())
print(f'info[perturbation] = {out.info["perturbation"]!r}')
print(f'dV/V = {(out.get_volume() / ref.get_volume() - 1) * 100:+.2f}%')
# total displacement combines the cell deformation and the rattle
disp = np.linalg.norm(out.positions - ref.positions, axis=1)
print(f'mean total atomic displacement = {disp.mean():.3f} A')
info[perturbation] = 'stretch(hydro=0.03, shear=0.02)+rattle(0.05)'
dV/V = +1.97%
mean total atomic displacement = 0.189 A
RandomChoice¶
Pick between two alternative perturbations with a given probability chance for the second. Use this to mix, e.g., position-only and combined position+cell perturbations into one stream.
rc = RandomChoice(
choice_a=Rattle(sigma=0.05, rng=0),
choice_b=Rattle(sigma=0.05, rng=0) + Stretch(hydro=0.03, shear=0.02, rng=0),
chance=0.5,
rng=0,
)
tags = [rc(ref.copy()).info['perturbation'] for _ in range(8)]
for t in tags:
print(t)
rattle(0.05)
rattle(0.05)+stretch(hydro=0.03, shear=0.02)
rattle(0.05)+stretch(hydro=0.03, shear=0.02)
rattle(0.05)+stretch(hydro=0.03, shear=0.02)
rattle(0.05)
rattle(0.05)
rattle(0.05)
rattle(0.05)
Streaming with perturb¶
perturb is the high-level driver: feed it an iterable of structures and an iterable of perturbations and it yields one perturbed structure per (input, perturbation) pair. Filters are applied to each candidate; if a candidate is rejected, the perturbation is retried up to retries times.
structures = [bulk('Cu', 'fcc', a=a, cubic=True).repeat(2) for a in (3.5, 3.6, 3.7)]
perturbations = [
Rattle(sigma=0.05, rng=0),
Stretch(hydro=0.04, shear=0.03, rng=0),
Rattle(sigma=0.05, rng=0) + Stretch(hydro=0.04, shear=0.03, rng=0),
]
# reject configurations with any Cu-Cu bond shorter than 2.0 A
filt = DistanceFilter(radii={'Cu': 1.0})
for s in perturb(structures, perturbations, filters=filt):
print(f'V={s.get_volume():.2f} A^3 {s.info["perturbation"]}')
V=343.00 A^3 rattle(0.05)
V=351.84 A^3 stretch(hydro=0.04, shear=0.03)
V=351.84 A^3 rattle(0.05)+stretch(hydro=0.04, shear=0.03)
V=373.25 A^3 rattle(0.05)
V=397.57 A^3 stretch(hydro=0.04, shear=0.03)
V=397.57 A^3 rattle(0.05)+stretch(hydro=0.04, shear=0.03)
V=405.22 A^3 rattle(0.05)
V=417.24 A^3 stretch(hydro=0.04, shear=0.03)
V=417.24 A^3 rattle(0.05)+stretch(hydro=0.04, shear=0.03)
Filters and retries¶
Random perturbations occasionally produce unphysical configurations (atoms too close, extreme cell aspect ratios, …). Pass any of the :mod:assyst.filters callables to drop or retry such candidates. With retries=10 (the default) perturb re-rolls the random perturbation up to ten times before giving up on that input.