{ "cells": [ { "cell_type": "markdown", "id": "intro-md", "metadata": {}, "source": [ "# Perturbations\n", "\n", "After relaxation, ASSYST perturbs structures to populate the training set with off-equilibrium configurations. This notebook walks through the perturbations available in :mod:`assyst.perturbations`:\n", "\n", "* :class:`~assyst.perturbations.Rattle` -- gaussian displacements of atomic positions\n", "* :class:`~assyst.perturbations.ElementScaledRattle` -- gaussian displacements with per-element scaling\n", "* :class:`~assyst.perturbations.Stretch` -- random affine cell deformation\n", "* :class:`~assyst.perturbations.Series` -- chain perturbations together\n", "* :class:`~assyst.perturbations.RandomChoice` -- pick between two perturbations at random\n", "\n", "Each can be applied directly to a single structure or via :func:`~assyst.perturbations.perturb` to a stream of structures, optionally filtered." ] }, { "cell_type": "code", "execution_count": 1, "id": "imports", "metadata": { "execution": { "iopub.execute_input": "2026-05-04T23:36:43.062547Z", "iopub.status.busy": "2026-05-04T23:36:43.062385Z", "iopub.status.idle": "2026-05-04T23:36:44.034829Z", "shell.execute_reply": "2026-05-04T23:36:44.034053Z" } }, "outputs": [], "source": [ "from assyst.perturbations import (\n", " Rattle,\n", " ElementScaledRattle,\n", " Stretch,\n", " Series,\n", " RandomChoice,\n", " perturb,\n", ")\n", "from assyst.filters import DistanceFilter\n", "\n", "from ase.build import bulk\n", "import numpy as np" ] }, { "cell_type": "markdown", "id": "structure-md", "metadata": {}, "source": [ "## Reference structure\n", "\n", "We use a 2x2x2 Cu super cell as the starting point. All perturbations operate **in place** on a copy of the input, so we always pass a fresh copy when comparing results." ] }, { "cell_type": "code", "execution_count": 2, "id": "structure-cell", "metadata": { "execution": { "iopub.execute_input": "2026-05-04T23:36:44.037033Z", "iopub.status.busy": "2026-05-04T23:36:44.036848Z", "iopub.status.idle": "2026-05-04T23:36:44.041765Z", "shell.execute_reply": "2026-05-04T23:36:44.041053Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "natoms = 32\n", "volume = 373.248 A^3\n" ] } ], "source": [ "ref = bulk('Cu', 'fcc', a=3.6, cubic=True).repeat(2)\n", "ref.info['source'] = 'reference'\n", "print(f'natoms = {len(ref)}')\n", "print(f'volume = {ref.get_volume():.3f} A^3')" ] }, { "cell_type": "markdown", "id": "rattle-md", "metadata": {}, "source": [ "## `Rattle`\n", "\n", "Displace each atom by a gaussian-random vector with standard deviation `sigma` (in A). Use this to sample short-wavelength phonon-like distortions.\n", "\n", "The `rng` argument accepts either an integer seed or a :class:`numpy.random.Generator`; passing a seed makes the perturbation reproducible.\n", "\n", "Rattling a structure with a single atom is meaningless because of the periodic boundary, so the underlying :func:`~assyst.perturbations.rattle` raises ``ValueError``. Set ``create_supercells=True`` to automatically replicate to 2x2x2 first." ] }, { "cell_type": "code", "execution_count": 3, "id": "rattle-cell", "metadata": { "execution": { "iopub.execute_input": "2026-05-04T23:36:44.043314Z", "iopub.status.busy": "2026-05-04T23:36:44.043162Z", "iopub.status.idle": "2026-05-04T23:36:44.047803Z", "shell.execute_reply": "2026-05-04T23:36:44.047078Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "sigma=0.02: mean |dx|=0.031 A, max |dx|=0.053 A\n", " info[perturbation] = 'rattle(0.02)'\n", "sigma=0.10: mean |dx|=0.157 A, max |dx|=0.265 A\n", " info[perturbation] = 'rattle(0.1)'\n", "sigma=0.20: mean |dx|=0.313 A, max |dx|=0.529 A\n", " info[perturbation] = 'rattle(0.2)'\n" ] } ], "source": [ "for sigma in (0.02, 0.1, 0.2):\n", " out = Rattle(sigma=sigma, rng=0)(ref.copy())\n", " disp = np.linalg.norm(out.positions - ref.positions, axis=1)\n", " print(f'sigma={sigma:.2f}: mean |dx|={disp.mean():.3f} A, max |dx|={disp.max():.3f} A')\n", " print(f' info[perturbation] = {out.info[\"perturbation\"]!r}')" ] }, { "cell_type": "markdown", "id": "scaled-md", "metadata": {}, "source": [ "## `ElementScaledRattle`\n", "\n", "Like `Rattle`, but the displacement standard deviation is given as a *relative* `sigma` and multiplied by a per-element reference length. This is useful for multi-component systems where different elements have different natural bond lengths -- you want the perturbations to scale accordingly rather than apply the same absolute noise everywhere.\n", "\n", "All elements present in the structure must have an entry in `reference`." ] }, { "cell_type": "code", "execution_count": 4, "id": "scaled-cell", "metadata": { "execution": { "iopub.execute_input": "2026-05-04T23:36:44.049216Z", "iopub.status.busy": "2026-05-04T23:36:44.049093Z", "iopub.status.idle": "2026-05-04T23:36:44.052870Z", "shell.execute_reply": "2026-05-04T23:36:44.052129Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "mean |dx| = 0.200 A (per-axis stdev = 0.128 A)\n" ] } ], "source": [ "ref_lengths = {'Cu': 2.55} # near-neighbor distance in fcc Cu\n", "out = ElementScaledRattle(sigma=0.05, reference=ref_lengths, rng=0)(ref.copy())\n", "disp = np.linalg.norm(out.positions - ref.positions, axis=1)\n", "# effective per-axis stdev is sigma * reference, so mean |dx| ~ sigma * reference * sqrt(8/pi)\n", "print(f'mean |dx| = {disp.mean():.3f} A (per-axis stdev = {0.05 * 2.55:.3f} A)')" ] }, { "cell_type": "markdown", "id": "stretch-md", "metadata": {}, "source": [ "## `Stretch`\n", "\n", "Apply a random affine deformation to the cell, scaling atomic positions with it.\n", "\n", "* `hydro` -- maximum diagonal (hydrostatic + uniaxial) strain magnitude\n", "* `shear` -- maximum off-diagonal (shear) strain magnitude\n", "* `minimum_strain` -- a floor on the magnitude of each strain component, ensuring the result is meaningfully different from the input. This avoids near-identity strains that confuse symmetry analyzers (e.g. VASP's)." ] }, { "cell_type": "code", "execution_count": 5, "id": "stretch-cell", "metadata": { "execution": { "iopub.execute_input": "2026-05-04T23:36:44.054307Z", "iopub.status.busy": "2026-05-04T23:36:44.054189Z", "iopub.status.idle": "2026-05-04T23:36:44.059837Z", "shell.execute_reply": "2026-05-04T23:36:44.059041Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "hydro=0.02, shear=0.005: dV/V = +1.35%\n", " cell =\n", "[[7.10981223 0.00838004 0.007676 ]\n", " [0.00838004 7.30699513 0.03062218]\n", " [0.007676 0.03062218 7.2815679 ]]\n", "hydro=0.005, shear=0.05: dV/V = +0.20%\n", " cell =\n", "[[7.17532889 0.02165546 0.01303095]\n", " [0.02165546 7.2282095 0.29412174]\n", " [0.01303095 0.29412174 7.2228564 ]]\n", "hydro=0.05, shear=0.05: dV/V = +3.10%\n", " cell =\n", "[[6.9787789 0.02165546 0.01303095]\n", " [0.02165546 7.46456639 0.29412174]\n", " [0.01303095 0.29412174 7.3989909 ]]\n" ] } ], "source": [ "for hydro, shear in [(0.02, 0.005), (0.005, 0.05), (0.05, 0.05)]:\n", " out = Stretch(hydro=hydro, shear=shear, rng=0)(ref.copy())\n", " print(f'hydro={hydro}, shear={shear}: dV/V = {(out.get_volume() / ref.get_volume() - 1) * 100:+.2f}%')\n", " print(f' cell =\\n{out.cell.array}')" ] }, { "cell_type": "markdown", "id": "series-md", "metadata": {}, "source": [ "## Composing perturbations\n", "\n", "Perturbations support `+` to chain into a :class:`~assyst.perturbations.Series` that applies each step in turn. The perturbation tag in `info[\"perturbation\"]` is built up accordingly so you can later see exactly what was done." ] }, { "cell_type": "code", "execution_count": 6, "id": "series-cell", "metadata": { "execution": { "iopub.execute_input": "2026-05-04T23:36:44.061216Z", "iopub.status.busy": "2026-05-04T23:36:44.061095Z", "iopub.status.idle": "2026-05-04T23:36:44.065383Z", "shell.execute_reply": "2026-05-04T23:36:44.064627Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "info[perturbation] = 'stretch(hydro=0.03, shear=0.02)+rattle(0.05)'\n", "dV/V = +1.97%\n", "mean total atomic displacement = 0.189 A\n" ] } ], "source": [ "combined = Stretch(hydro=0.03, shear=0.02, rng=0) + Rattle(sigma=0.05, rng=0)\n", "out = combined(ref.copy())\n", "print(f'info[perturbation] = {out.info[\"perturbation\"]!r}')\n", "print(f'dV/V = {(out.get_volume() / ref.get_volume() - 1) * 100:+.2f}%')\n", "# total displacement combines the cell deformation and the rattle\n", "disp = np.linalg.norm(out.positions - ref.positions, axis=1)\n", "print(f'mean total atomic displacement = {disp.mean():.3f} A')" ] }, { "cell_type": "markdown", "id": "choice-md", "metadata": {}, "source": [ "## `RandomChoice`\n", "\n", "Pick between two alternative perturbations with a given probability `chance` for the second. Use this to mix, e.g., position-only and combined position+cell perturbations into one stream." ] }, { "cell_type": "code", "execution_count": 7, "id": "choice-cell", "metadata": { "execution": { "iopub.execute_input": "2026-05-04T23:36:44.066782Z", "iopub.status.busy": "2026-05-04T23:36:44.066645Z", "iopub.status.idle": "2026-05-04T23:36:44.072235Z", "shell.execute_reply": "2026-05-04T23:36:44.071483Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "rattle(0.05)\n", "rattle(0.05)+stretch(hydro=0.03, shear=0.02)\n", "rattle(0.05)+stretch(hydro=0.03, shear=0.02)\n", "rattle(0.05)+stretch(hydro=0.03, shear=0.02)\n", "rattle(0.05)\n", "rattle(0.05)\n", "rattle(0.05)\n", "rattle(0.05)\n" ] } ], "source": [ "rc = RandomChoice(\n", " choice_a=Rattle(sigma=0.05, rng=0),\n", " choice_b=Rattle(sigma=0.05, rng=0) + Stretch(hydro=0.03, shear=0.02, rng=0),\n", " chance=0.5,\n", " rng=0,\n", ")\n", "tags = [rc(ref.copy()).info['perturbation'] for _ in range(8)]\n", "for t in tags:\n", " print(t)" ] }, { "cell_type": "markdown", "id": "perturb-md", "metadata": {}, "source": [ "## Streaming with `perturb`\n", "\n", "`perturb` is the high-level driver: feed it an iterable of structures and an iterable of perturbations and it yields one perturbed structure per (input, perturbation) pair. Filters are applied to each candidate; if a candidate is rejected, the perturbation is retried up to `retries` times." ] }, { "cell_type": "code", "execution_count": 8, "id": "perturb-cell", "metadata": { "execution": { "iopub.execute_input": "2026-05-04T23:36:44.073645Z", "iopub.status.busy": "2026-05-04T23:36:44.073499Z", "iopub.status.idle": "2026-05-04T23:36:44.275746Z", "shell.execute_reply": "2026-05-04T23:36:44.274762Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "V=343.00 A^3 rattle(0.05)\n", "V=351.84 A^3 stretch(hydro=0.04, shear=0.03)\n", "V=351.84 A^3 rattle(0.05)+stretch(hydro=0.04, shear=0.03)\n", "V=373.25 A^3 rattle(0.05)\n", "V=397.57 A^3 stretch(hydro=0.04, shear=0.03)\n", "V=397.57 A^3 rattle(0.05)+stretch(hydro=0.04, shear=0.03)\n", "V=405.22 A^3 rattle(0.05)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "V=417.24 A^3 stretch(hydro=0.04, shear=0.03)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "V=417.24 A^3 rattle(0.05)+stretch(hydro=0.04, shear=0.03)" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "structures = [bulk('Cu', 'fcc', a=a, cubic=True).repeat(2) for a in (3.5, 3.6, 3.7)]\n", "perturbations = [\n", " Rattle(sigma=0.05, rng=0),\n", " Stretch(hydro=0.04, shear=0.03, rng=0),\n", " Rattle(sigma=0.05, rng=0) + Stretch(hydro=0.04, shear=0.03, rng=0),\n", "]\n", "# reject configurations with any Cu-Cu bond shorter than 2.0 A\n", "filt = DistanceFilter(radii={'Cu': 1.0})\n", "\n", "for s in perturb(structures, perturbations, filters=filt):\n", " print(f'V={s.get_volume():.2f} A^3 {s.info[\"perturbation\"]}')" ] }, { "cell_type": "markdown", "id": "filters-md", "metadata": {}, "source": [ "## Filters and retries\n", "\n", "Random perturbations occasionally produce unphysical configurations (atoms too close, extreme cell aspect ratios, ...). Pass any of the :mod:`assyst.filters` callables to drop or retry such candidates. With `retries=10` (the default) `perturb` re-rolls the random perturbation up to ten times before giving up on that input." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.3" } }, "nbformat": 4, "nbformat_minor": 5 }