Crystal Generation

The first step in the ASSYST workflow is the generation of random, periodic crystal structures. These will form the seeds from which the rest of the training set is grown by relaxation and random perturbation.

This notebook shows how to:

  1. specify the chemical compositions to sample with Formulas,

  2. generate symmetric crystal structures with sample, and

  3. inspect the result by plotting the resulting distribution of compositions and bond distances.

Imports

from assyst.crystals import Formulas, sample
from assyst.plot import concentration_histogram, distance_histogram
/root/.local/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm

Formulas

The smallest unit ASSYST samples is a formula unit, i.e. a dictionary that specifies how many atoms of each species are in one structure. Examples are

  • Mg$_2$Ca

  • Mg$_4$Ca$_2$

  • and so on.

For each formula ASSYST tries to generate as many different space groups as possible, giving a structurally diverse set of seeds for the same composition.

Formulas is a thin wrapper around a tuple of these dictionaries that adds a few convenient operators for building such collections.

Manual construction

Formulas can be constructed by hand,

mg = Formulas(
    ({'Mg': 1}, {'Mg': 2}, {'Mg': 3}, {'Mg': 4})
)
mg
Formulas(atoms=({'Mg': 1}, {'Mg': 2}, {'Mg': 3}, {'Mg': 4}))
mgca = Formulas(
    ({'Mg': 2, 'Ca': 1}, {'Mg': 4, 'Ca': 2})
)
mgca
Formulas(atoms=({'Mg': 2, 'Ca': 1}, {'Mg': 4, 'Ca': 2}))

Helpers

or with the range helper, which mimics the builtin range for a single element.

Formulas.range('Mg', 1, 5)
Formulas(atoms=({'Mg': 1}, {'Mg': 2}, {'Mg': 3}, {'Mg': 4}))

Algebra

Formulas overloads +, | and * so that larger sets can be built from smaller ones.

+ simply concatenates the formula tuples,

Formulas.range('Mg', 1, 5) + Formulas.range('Ca', 1, 3)
Formulas(atoms=({'Mg': 1}, {'Mg': 2}, {'Mg': 3}, {'Mg': 4}, {'Ca': 1}, {'Ca': 2}))

| does an element-wise combination, like an inner product (truncates to the shorter sequence),

Formulas.range('Mg', 1, 5) | Formulas.range('Ca', 1, 3)
Formulas(atoms=({'Mg': 1, 'Ca': 1}, {'Mg': 2, 'Ca': 2}))

and * produces the outer product, all combinations of two element ranges.

Formulas.range('Mg', 1, 5) * Formulas.range('Ca', 1, 3)
Formulas(atoms=({'Mg': 1, 'Ca': 1}, {'Mg': 1, 'Ca': 2}, {'Mg': 2, 'Ca': 1}, {'Mg': 2, 'Ca': 2}, {'Mg': 3, 'Ca': 1}, {'Mg': 3, 'Ca': 2}, {'Mg': 4, 'Ca': 1}, {'Mg': 4, 'Ca': 2}))

Sampling

Once we have the formulas we can ask ASSYST to generate symmetric structures for them. sample is a generator: it yields one ase.Atoms at a time so we materialize the result with list.

structures = list(sample(
    formulas=mgca,
))
print(f"Generated {len(structures)} structures")
Spacegroups: 100%|██████████| 230/230 [00:03<00:00, 63.20it/s]
Spacegroups: 100%|██████████| 230/230 [00:16<00:00, 13.85it/s]
Mg4Ca2: 100%|██████████| 2/2 [00:20<00:00, 11.27s/it]
Mg4Ca2: 100%|██████████| 2/2 [00:20<00:00, 10.13s/it]
Generated 154 structures

Each entry is a regular ase.Atoms object enriched with provenance information in info (see the Lineage notebook for details).

structures[0]
Atoms(symbols='Mg2Ca', pbc=True, cell=[[5.479671805958892, -1.7139494480728563, -0.10455948903220216], [0.0, 2.6195168936402604, -0.19173510821008755], [0.0, 0.0, 4.396734373534593]])

Visualization

assyst.plot provides a number of helpers to inspect a set of structures at a glance. Two useful ones for the very first sampling step are the bond distance histogram and the concentration histogram.

distance_histogram(structures, rmax=10, reduce=lambda x: x);
../../_images/74c29a86d7ab18b00dfb0f93a9bf62ba064a93b74df08f26cb0457f1121996a6.png
concentration_histogram(structures);
../../_images/eb55785f8ebd02c31615ed2bf53b8c81760feb6473332fc79fe9b6f40b290c96.png

Advanced sampling

sample accepts several options for restricting which space groups are generated, putting bounds on the structure size, capping the total number of structures, and so on. The full signature is:

help(sample)
Help on function sample in module assyst.crystals:
sample(formulas: Union[assyst.crystals.Formulas, Iterable[dict[str, int]]], spacegroups: Union[list[int], tuple[int, ...], Iterable[int], NoneType] = None, min_atoms: int = 1, max_atoms: int = 10, max_structures: int | None = None, dim: Literal[0, 1, 2, 3] = 3, tolerance: Union[Literal['metallic', 'atomic', 'molecular', 'vdW'], assyst.filters.DistanceFilter, dict] = 'metallic', rng: Union[int, numpy.random._generator.Generator, NoneType] = None) -> Iterator[ase.atoms.Atoms]
Create symmetric random structures.
Args:
formulas (:class:`.Formulas` or :class:`collections.abc.Iterable` of :class:`dict` from :class:`str` to :class:`int`): :class:`list` of chemical formulas
spacegroups (list of int): which space groups to generate
min_atoms (int): do not generate structures smaller than this
max_atoms (int): do not generate structures larger than this
max_structures (int): generate at most this many structures
dim (one of 0, 1, 2, or 3): the dimensionality of the structures to generate; if lower than 3 the code generates
samples no longer from space groups, but from the subperiodic layer, rod, or point groups.
tolerance (str, dict of elements to radii):
specifies minimum allowed distances between atoms in generated structures;
if str then it should be one values understood by :class:`pyxtal.tolerance.Tol_matrix`;
if dict each value gives the minimum *radius* allowed for an atom, whether a given distance is allowed then
depends on the sum of the radii of the respective elements
rng (:class:`int`, :class:`numpy.random.Generator`): seed or random number generator
Yields:
:class:`ase.Atoms`: random symmetric crystal structures

Some useful non-default options:

  • spacegroups — restrict to a subset of space groups, e.g. cubic ones (195–230):

    sample(formulas, spacegroups=range(195, 231))
    
  • max_structures — cap the total yield; handy for a quick sanity check:

    sample(formulas, max_structures=20)
    
  • min_atoms / max_atoms — enforce a size window so the supercell stays tractable:

    sample(formulas, min_atoms=2, max_atoms=6)
    
  • dim=2 — generate 2-D layer structures instead of bulk crystals (layer groups instead of space groups):

    sample(formulas, dim=2)
    
  • tolerance — override the default metallic radii with custom minimum distances (in Å per element radius):

    sample(formulas, tolerance={'Mg': 1.0, 'Ca': 1.2})