Perturbations

After relaxation, ASSYST perturbs structures to populate the training set with off-equilibrium configurations. This notebook walks through the perturbations available in assyst.perturbations:

  • Rattle – gaussian displacements of atomic positions

  • ElementScaledRattle – gaussian displacements with per-element scaling

  • Stretch – random affine cell deformation

  • Series – chain perturbations together

  • RandomChoice – pick between two perturbations at random

Each can be applied directly to a single structure or via perturb() to a stream of structures, optionally filtered.

from assyst.perturbations import (
    Rattle,
    ElementScaledRattle,
    Stretch,
    Series,
    RandomChoice,
    perturb,
)
from assyst.filters import DistanceFilter

from ase.build import bulk
import numpy as np

Reference structure

We use a 2x2x2 Cu super cell as the starting point. All perturbations operate in place on a copy of the input, so we always pass a fresh copy when comparing results.

ref = bulk('Cu', 'fcc', a=3.6, cubic=True).repeat(2)
ref.info['source'] = 'reference'
print(f'natoms   = {len(ref)}')
print(f'volume   = {ref.get_volume():.3f} A^3')
natoms   = 32
volume   = 373.248 A^3

Rattle

Displace each atom by a gaussian-random vector with standard deviation sigma (in A). Use this to sample short-wavelength phonon-like distortions.

The rng argument accepts either an integer seed or a numpy.random.Generator; passing a seed makes the perturbation reproducible.

Rattling a structure with a single atom is meaningless because of the periodic boundary, so the underlying rattle() raises ValueError. Set create_supercells=True to automatically replicate to 2x2x2 first.

for sigma in (0.02, 0.1, 0.2):
    out = Rattle(sigma=sigma, rng=0)(ref.copy())
    disp = np.linalg.norm(out.positions - ref.positions, axis=1)
    print(f'sigma={sigma:.2f}: mean |dx|={disp.mean():.3f} A, max |dx|={disp.max():.3f} A')
    print(f'  info[perturbation] = {out.info["perturbation"]!r}')
sigma=0.02: mean |dx|=0.031 A, max |dx|=0.053 A
  info[perturbation] = 'rattle(0.02)'
sigma=0.10: mean |dx|=0.157 A, max |dx|=0.265 A
  info[perturbation] = 'rattle(0.1)'
sigma=0.20: mean |dx|=0.313 A, max |dx|=0.529 A
  info[perturbation] = 'rattle(0.2)'

ElementScaledRattle

Like Rattle, but the displacement standard deviation is given as a relative sigma and multiplied by a per-element reference length. This is useful for multi-component systems where different elements have different natural bond lengths – you want the perturbations to scale accordingly rather than apply the same absolute noise everywhere.

All elements present in the structure must have an entry in reference.

ref_lengths = {'Cu': 2.55}  # near-neighbor distance in fcc Cu
out = ElementScaledRattle(sigma=0.05, reference=ref_lengths, rng=0)(ref.copy())
disp = np.linalg.norm(out.positions - ref.positions, axis=1)
# effective per-axis stdev is sigma * reference, so mean |dx| ~ sigma * reference * sqrt(8/pi)
print(f'mean |dx| = {disp.mean():.3f} A   (per-axis stdev = {0.05 * 2.55:.3f} A)')
mean |dx| = 0.200 A   (per-axis stdev = 0.128 A)

Stretch

Apply a random affine deformation to the cell, scaling atomic positions with it.

  • hydro – maximum diagonal (hydrostatic + uniaxial) strain magnitude

  • shear – maximum off-diagonal (shear) strain magnitude

  • minimum_strain – a floor on the magnitude of each strain component, ensuring the result is meaningfully different from the input. This avoids near-identity strains that confuse symmetry analyzers (e.g. VASP’s).

for hydro, shear in [(0.02, 0.005), (0.005, 0.05), (0.05, 0.05)]:
    out = Stretch(hydro=hydro, shear=shear, rng=0)(ref.copy())
    print(f'hydro={hydro}, shear={shear}: dV/V = {(out.get_volume() / ref.get_volume() - 1) * 100:+.2f}%')
    print(f'  cell =\n{out.cell.array}')
hydro=0.02, shear=0.005: dV/V = +1.35%
  cell =
[[7.10981223 0.00838004 0.007676  ]
 [0.00838004 7.30699513 0.03062218]
 [0.007676   0.03062218 7.2815679 ]]
hydro=0.005, shear=0.05: dV/V = +0.20%
  cell =
[[7.17532889 0.02165546 0.01303095]
 [0.02165546 7.2282095  0.29412174]
 [0.01303095 0.29412174 7.2228564 ]]
hydro=0.05, shear=0.05: dV/V = +3.10%
  cell =
[[6.9787789  0.02165546 0.01303095]
 [0.02165546 7.46456639 0.29412174]
 [0.01303095 0.29412174 7.3989909 ]]

Composing perturbations

Perturbations support + to chain into a Series that applies each step in turn. The perturbation tag in info["perturbation"] is built up accordingly so you can later see exactly what was done.

combined = Stretch(hydro=0.03, shear=0.02, rng=0) + Rattle(sigma=0.05, rng=0)
out = combined(ref.copy())
print(f'info[perturbation] = {out.info["perturbation"]!r}')
print(f'dV/V = {(out.get_volume() / ref.get_volume() - 1) * 100:+.2f}%')
# total displacement combines the cell deformation and the rattle
disp = np.linalg.norm(out.positions - ref.positions, axis=1)
print(f'mean total atomic displacement = {disp.mean():.3f} A')
info[perturbation] = 'stretch(hydro=0.03, shear=0.02)+rattle(0.05)'
dV/V = +1.97%
mean total atomic displacement = 0.189 A

RandomChoice

Pick between two alternative perturbations with a given probability chance for the second. Use this to mix, e.g., position-only and combined position+cell perturbations into one stream.

rc = RandomChoice(
    choice_a=Rattle(sigma=0.05, rng=0),
    choice_b=Rattle(sigma=0.05, rng=0) + Stretch(hydro=0.03, shear=0.02, rng=0),
    chance=0.5,
    rng=0,
)
tags = [rc(ref.copy()).info['perturbation'] for _ in range(8)]
for t in tags:
    print(t)
rattle(0.05)
rattle(0.05)+stretch(hydro=0.03, shear=0.02)
rattle(0.05)+stretch(hydro=0.03, shear=0.02)
rattle(0.05)+stretch(hydro=0.03, shear=0.02)
rattle(0.05)
rattle(0.05)
rattle(0.05)
rattle(0.05)

Streaming with perturb

perturb is the high-level driver: feed it an iterable of structures and an iterable of perturbations and it yields one perturbed structure per (input, perturbation) pair. Filters are applied to each candidate; if a candidate is rejected, the perturbation is retried up to retries times.

structures = [bulk('Cu', 'fcc', a=a, cubic=True).repeat(2) for a in (3.5, 3.6, 3.7)]
perturbations = [
    Rattle(sigma=0.05, rng=0),
    Stretch(hydro=0.04, shear=0.03, rng=0),
    Rattle(sigma=0.05, rng=0) + Stretch(hydro=0.04, shear=0.03, rng=0),
]
# reject configurations with any Cu-Cu bond shorter than 2.0 A
filt = DistanceFilter(radii={'Cu': 1.0})

for s in perturb(structures, perturbations, filters=filt):
    print(f'V={s.get_volume():.2f} A^3   {s.info["perturbation"]}')
V=343.00 A^3   rattle(0.05)
V=351.84 A^3   stretch(hydro=0.04, shear=0.03)
V=351.84 A^3   rattle(0.05)+stretch(hydro=0.04, shear=0.03)
V=373.25 A^3   rattle(0.05)
V=397.57 A^3   stretch(hydro=0.04, shear=0.03)
V=397.57 A^3   rattle(0.05)+stretch(hydro=0.04, shear=0.03)
V=405.22 A^3   rattle(0.05)
V=417.24 A^3   stretch(hydro=0.04, shear=0.03)
V=417.24 A^3   rattle(0.05)+stretch(hydro=0.04, shear=0.03)

Filters and retries

Random perturbations occasionally produce unphysical configurations (atoms too close, extreme cell aspect ratios, …). Pass any of the assyst.filters callables to drop or retry such candidates. With retries=10 (the default) perturb re-rolls the random perturbation up to ten times before giving up on that input.