GASP-python/docs/usage.md at master · henk789/GASP-python · GitHub

Contents

Input
- Main input file
Output
Energy Code Interfaces
Non-Bulk Geometries
Resuming Calculations
Strategies

Input

Several scripts are added to your path when GASP is installed. The one used to run the genetic algorithm is called 'run.py'. To run the algorithm, type:

run.py input_file.yaml

where 'input_file.yaml' is the main input file read by GASP that contains the parameters for the structure search. It must be in YAML format and end with '.yaml'.

Note: YAML does not allow tabs, and if the input file contains tabs, an error will arise when the algorithm tries to read it. GASP comes with a script that takes a file as an argument and replaces each tab with four spaces. To use it on an input file, type:

replace_tabs.py input_file.yaml

Comments may be placed in the input file on lines that start with '#'.

Several example input files can be found here.

In addition to the main input file, other files may be necessary for the genetic algorithm to run, depending on the input options. For example, the interfaces to external energy codes require files that specify potentials, and the initial population may be read from disk.

Go back to Contents

Main input file

Below are all the keywords and parameters that may be given in the input file. Most of them are optional, and the non-optional ones are noted as such. For the optional parameters, default values are used if they are not specified in the input file. Placeholders for parameter values are given in angle brackets.

CompositionSpace (not optional)

CompositionSpace:
    - <string>
    - <string>
    - ...

The CompositionSpace keyword specifies the composition or composition range over which to search for structures. It is not optional. The strings (preceded by dashes) on the subsequent lines are the chemical formulas corresponding to the endpoint compositions. For example, to search of structures of Al₂O₃, the command would be

CompositionSpace:
    - Al2O3

If only one endpoint is given, then the algorithm will only search for structures with that composition. In that case, the quantity the algorithm seeks to minimize is the energy per atom.

If more than one endpoint is given, then the algorithm will search for structures with compositions between the endpoints. For example, to search for structures in the Al-Cu system, the command would be

CompositionSpace:
    - Al
    - Cu

In this case, the algorithm searches for the phase diagram of the Al-Cu system, and the quantity the algorithm seeks to minimize is the distance of structures from the best known convex hull of the system. The endpoints define the range of allowed compositions, and they need not be elemental.

If multiple endpoints are given, reference structures with compositions at the endpoints must be provided in the initial population. See the InitialPopulation keyword.

Go back to Contents

EnergyCode (not optional)

EnergyCode:
    gulp | lammps | vasp:
        <paths to input files>

The EnergyCode keyword specifies the external code used to compute the energies of structures generated by the genetic algorithm. It is not optional. The algorithm has interfaces to GULP, LAMMPS and VASP. Details for each case are given below.

gulp

EnergyCode:
    gulp:
        header_file: <string>
        potential_file: <string>

The gulp keyword specifies that GULP will be used for the energy calculations, and the header_file and potential_file keywords must appear on the subsequent lines.

The header_file keyword is followed by the path to the header file, which contains the first couple lines of the GULP input file that specify how the energy minimization is to be done.

The potential_file keyword is followed by the path to the potential file, which contains the lines of the GULP input file that specify the empirical potential to use during the energy calculation.

See here for descriptions and examples of the GULP header and potential files.

lammps

EnergyCode:
    lammps:
        input_script: <string>

The lammps keyword specifies that LAMMPS will be used for the energy calculations, and the input_script keyword must appear on the next line.

The input_script keyword is followed by the path to a script containing the input to be read by LAMMPS.

See here for a description and example of a LAMMPS input script.

vasp

    EnergyCode:
        vasp:
            num_submits_to_converge: <int>
            num_rerelax: <int>
            incar: <string>
            kpoints: <string>
            potcars:
               <string>: <string>
               <string>: <string>
               ...

The vasp keyword specifies that VASP will be used for the energy calculations, and the incar, kpoints and potcars keywords must appear on the subsequent lines.

The incar keyword is followed by the path to the INCAR file to use for all the VASP calculations submitted by the algorithm.

The kpoints keyword is followed by the path to the KPOINTS file to use for all the VASP calculations submitted by the algorithm.

The potcars keyword is followed by lines specifying the paths to the POTCAR files to use for each element. On each line, the first string is the element symbol, and the second string is the path to the corresponding POTCAR file. POTCAR files must be provided for all elements in the composition space.

The num_submits_to_converge keyword specifies the number of VASP calculations to run on an organism after checking that it is not converged. That is, if a VASP calculation is not converged, the organism is resubmitted (num_submits_to_converge - 1) times before discarding the organism. Optional, default is 2. The VASP output files are indexed from 0 to num_submits_to_converge in job directory.

The num_rerelax keyword specifies the number of times to submit a VASP calculation to re-relax an organism. Note that this parameter submits for re-relaxation whether or not a previous calculation is converged. This is optional, default is 0. The VASP output files are indexed from num_submits_to_converge till num_rerelax in job directory.

To simply relax all structures n times irrespective of convergence, use

num_submits_to_converge: 0 # does nothing
num_rerelax: n # submits n relaxations

See here for a more detailed description of how to use VASP with GASP.

Go back to Contents

InitialPopulation (not optional for phase diagram searches)

InitialPopulation:
    from_files:
        path_to_folder: <string>
    random:
        number: <integer>
        max_num_atoms: <integer>
        allow_endpoints: <boolean>
        volumes_per_atom:
            <string>: <float>
            <string>: <float>
            ...

The InitialPopulation keyword specifies how the algorithm is to generate the initial population of structures. It is optional for fixed composition searches (when there is only one endpoint in CompositionSpace), but it is not optional for phase diagram searches, and structures at each endpoint composition must be provided using the from_files keyword.

from_files

Specifies that the algorithm is to read structures from provided files and add them to the initial population. It is not optional for phase diagram searches. If used, the path_to_folder keyword must appear on the next line, followed by the path to the directory containing the structure files. The structure files must be in either POSCAR or CIF format, and their names must begin with 'POSCAR.' or end with '.cif', respectively.

random

Specifies that the algorithm is to generate random structures and add them to the initial population. It is optional. If used, the keyword number must appear on the next line, followed by the number of random structures to make for the initial population.

The optional keyword max_num_atoms within the random block specifies the maximum number of atoms allowed in the randomly generated structures. Defaults to 6 plus the value of min_num_atoms in Constraints or the value of max_num_atoms in Constraints, whichever is smaller. The value given here must lie in the range defined by the values of min_num_atoms and max_num_atoms in Constraints (including the endpoints).

The optional keyword allow_endpoints within the random block specifies whether the randomly generated structures are allowed to have compositions equivalent to the endpoints of the composition space. Defaults to True, and is only used for phase diagram searches.

The optional keyword volumes_per_atom within the random block specifies how to scale the volumes (per atom) of the randomly generated structures. In particular, the volume of a random structure is scaled to the sum of the volumes of the atoms within the structure, where the volume (in cubic Angstroms) of each atom type is given after its chemical symbol. If the volume for an atom type is not given, the value computed from the elemental ground state structure listed on materials project is used. This is the default behavior.

For fixed composition searches, the entire InitialPopulation block is optional. If not specified, it defaults to this:

InitialPopulation:
    random:
        number: 28
        max_num_atoms: 8
        allow_endpoints: True
        volumes_per_atom:
            Al: 16.47
            Cu: 11.82

where the default max_num_atoms was computed from the default min_num_atoms in Constraints (note that the default value of max_num_atoms could be different if the min_num_atoms keyword in Constraints is specified). The default volumes per atom were computed from the ground state structures of Al and Cu.

To provide a sufficient initial sampling of the solution space, we recommend that the number of structures in the initial population exceed the number of structures in the pool (see the Pool keyword). This is achieved by the default values for fixed composition searches: the pool contains 20 structures and the initial population contains 28 structures - a 40% increase. For phase diagram searches, the initial population must be specified in the input file. The default pool size for a binary phase diagram search is 25 structures, so to achieve a 40% increase, the initial population should contain 35 structures. This is easily done by specifying that the initial population contain 33 random structures in addition to the two reference structures at the endpoint compositions. For this case, the InitialPopulation block would look like this:

InitialPopulation:
    from_files:
        path_to_folder: <path to folder containing two structures at the endpoint compositions>
    random:
        number: 33

where the max_num_atoms and volumes_per_atom keywords have been omitted, resulting in default values being used. Similarly, for ternary phase diagram searches, the default pool size is 75 structures, so 105 structures are needed in the initial population to achieve a 40% increase; here is an example:

InitialPopulation:
    from_files:
        path_to_folder: <path to folder containing three structures at the endpoint compositions>
    random:
        number: 102

And for quaternary phase diagram searches, the default pool size is 150 structures, so we need 210 structures in the initial population:

InitialPopulation:
    from_files:
        path_to_folder: <path to folder containing four structures at the endpoint compositions>
    random:
        number: 206

Go back to Contents

NumCalcsAtOnce

NumCalcsAtOnce: <integer>

Specifies how many energy calculations the algorithm should run at a time. Optional, and defaults to 1 (serial).

Go back to Contents

RunTitle

RunTitle: <string>

Specifies the name of the search. Optional, and if not specified, the algorithm will write its output to a directory called 'garun'. If specified, the algorithm will write its output to a directory called 'garun_run_title', where run_title is the name given after the RunTitle keyword.

If a directory already exists with the same name as the output directory (e.g., from a previous search), the algorithm will write its output to a directory with the date and time appended to the name.

Go back to Contents

Pool

Pool:
    size: <integer>
    num_promoted: <integer>

The Pool keyword specifies the pool of structures that represent the current population. The entire block is optional.

size

Specifies the number of structures in the pool. Defaults to 20, 25, 75 and 150 for fixed composition, binary phase diagram, ternary phase diagram and quaternary phase diagram searches, respectively.

num_promoted

Specifies how many of the best structures are promoted. Defaults to 3 for fixed composition searches. For phase diagram searches, all the structures on the current convex hull are promoted (num_promoted is ignored). If this keyword is used, it must be passed an integer greater than zero (at least one organism must be promoted).

Go back to Contents

Selection

Selection:
    num_parents: <integer>
    power: <float>

The Selection keyword specifies how the selection probabilities of structures in the current population (i.e., the pool) are computed. The entire block is optional.

num_parents

Specifies how many structures from the pool will have a chance to become parents. In particular, the best num_promoted structures (as determined from their objective function values) in the pool will be assigned non-zero selection probabilities. Optional, and defaults to the size of the pool (see the Pool keyword). If the value given to num_parents is greater than the size of the pool, then num_parents is set to the size of the pool.

power

Specifies the exponent of the power law that defines how structures' selection probabilities are computed from their fitnesses. Optional, and defaults to 1.0 (linear).

See TODO for a description of how selection probabilities are computed.

Go back to Contents

CompositionFitnessWeight

CompositionFitnessWeight:
    max_weight: <float>
    power: <float>

The CompositionFitnessWeight keyword specifies how the weight assigned to an organism's composition fitness is computed. The composition fitness of one organism relative to another is defined as 1 - d, where d is the normalized distance between the two organisms in composition space (see TODO for details on how this distance is computed).

When selecting a second parent organism (e.g., in the Mating variation), the fitnesses of the remaining organisms in the population are computed relative to the first parent organism. These relative fitnesses are taken to be the weighted average of the regular fitness (based on objective function value) and the composition fitness (based on distance from the first parent organism in composition space). The weight assigned to the composition fitness ranges from 0 (if the first parent organism is located at the center of the composition space) to max_weight (if the first parent organism is located at an endpoint of the composition space).

The motivation for using the composition fitness is to prevent the population from drifting toward intermediate compositions as the search progresses. This drift is likely to happen because the offspring produced by the mating variation tend to have compositions between those of the parents. To combat this, we use relative fitnesses to make it more likely for two parents to have similar compositions, especially near the endpoints of the composition space.

The entire block is optional, and it is only used for phase diagram searches.

max_weight

Specifies the maximum weight to give to the composition fitness (when the first parent is located at an enpoint of the composition space). Must lie between 0.0 and 1.0. Optional, and defaults to 0.5.

power

Specifies the exponent of the power law used to compute the weight of the composition fitness. Optional, and defaults to 1 (linear).

Go back to Contents

Variations

Variations:
    Mating:
        <mating parameters>
    StructureMut:
        <structure mutation parameters>
    NumAtomsMut:
        <number of atoms mutation parameters>
    Permutation:
        <permutation parameters>

The Variations keyword specifies the variations the algorithm uses to create offspring structures from parents. The entire block is optional.

Mating

Variations:
    Mating:
        fraction: <float>
        mu_cut_loc: <float>
        sigma_cut_loc: <float>
        shift_prob: <float>
        rotate_prob: <float>
        doubling_prob: <float>
        grow_parents: <boolean>
        merge_cutoff: <float>
        halve_offspring_prob: <float>

The Mating keyword specifies the parameters associated with the cut-and-splice mating variation.

fraction

Specifies the fraction of offspring to create with the mating variation. If the entire Variations block is not specified, then fraction defaults to 0.7 or 0.8, depending on the default fractions of the NumAtomsMut and Permutation variations. Otherwise, it is not optional. Furthermore, the values of the fraction keywords of all specified variations (could include Mating, StructureMut, NumAtomsMut and Permutation) must sum to 1.0.

mu_cut_loc

Specifies the mean of the Gaussian distribution from which the (fractional) cut location is drawn. Optional, and defaults to 0.5.

sigma_cut_loc

Specifies the standard deviation of the Gaussian distribution from which the (fractional) cut location is drawn. Optional, and defaults to 0.5

shift_prob

Specifies the probability that the atoms in the parent structures are shifted along the lattice vector that is to be cut before the cut is made. The (fractional) magnitudes of the shifts are drawn from a uniform distribution over the unit interval. Optional, and defaults to 1.0.

For non-bulk structures, the atoms are only shifted if the lattice vector that is to be sliced corresponds to a periodic direction, regardless of the value passed to shift_prob. See here for more details on searching for non-bulk structures.

rotate_prob

Specifies the probability that the lattices of the parent structures are rotated (relative to the atoms) before the cut is made. The angle of the rotation is drawn from a uniform distribution over [0, 360). For bulk and 2D structures, no rotation is done, regardless of the value passed to rotate_prob. For wire structures, the lattice is rotated about the c lattice vector. For clusters, the lattice is rotated about all three lattice vectors. Optional, and defaults to 1.0.

doubling_prob

Specifies the probability that one of the parent structures is doubled before mating. Optional, and defaults to 0.1.

grow_parents

Specifies whether to grow the smaller parent to the approximate size of the larger parent before mating. Growing is done by taking supercells of the smaller parent along its shortest lattice vector. Optional, and defaults to True.

merge_cutoff

Specifies the distance (as fraction of atomic radius) below which to merge atoms of the same type in the offspring structure. Optional, and defaults to 1.0.

halve_offspring_prob

Only allies to the interface geometry) Specifies the probability to halve the offspring cell made by mating. This is introduced to explore small area structures as the search often progresses towards large area structures. (large area results in more surface relaxation and more fit organisms). Optional, defaults to 0.25.

Go back to Contents

StructureMut

Variations:
    StructureMut:
        fraction: <float>
        frac_atoms_perturbed: <float>
        sigma_atomic_coord_perturbation: <float>
        max_atomic_coord_perturbation: <float>
        sigma_strain_matrix_element: <float>

The StructureMut keyword specifies the parameters associated with the structure mutation variation.

fraction

Specifies the fraction of offspring to create with the structure mutation variation. If the entire Variations block is not specified, then fraction defaults to 0.1 or 0.2, depending on the fractions of the NumAtomsMut and Permutation variations. Otherwise, it is not optional. Furthermore, the values of the fraction keywords of all specified variations (could include Mating, StructureMut, NumAtomsMut and Permutation) must sum to 1.0.

frac_atoms_perturbed

Specifies the (average) fraction of atoms in the parent structure that are perturbed. Alternatively, specifies the probability of each atom in the parent structure being perturbed. Optional, and defaults to 1.0

sigma_atomic_coord_perturbation

Specifies the distribution from which the random atomic perturbations are drawn. In particular, atoms are shifted along each Cartesian coordinate by amounts (in Angstroms) drawn from a Gaussian distribution with mean zero and standard deviation equal to the value passed to sigma_atomic_coord_perturbation. Optional, and defaults to 1.0.

max_atomic_coord_perturbation

Specifies the maximum allowed perturbation (in Angstroms) of an atom's Cartesian coordinates. Optional, and defaults to 5.0.

sigma_strain_matrix_element

Specifies the distribution from which the non-identity components of the elements of the strain matrix are drawn. In particular, the strain matrix is constructed by starting with an identity matrix and adding a random perturbation to each element that is drawn from a Gaussian distribution with mean zero and standard deviation equal to the value passed to sigma_strain_matrix_element. Optional, and defaults to 0.2.

Go back to Contents

NumAtomsMut

Variations:
    NumAtomsMut:
        fraction: <float>
        mu_num_adds: <float>
        sigma_num_adds: <float>
        scale_volume: <boolean>

The NumAtomsMut keyword specifies the parameters associated with the number of atoms mutation variation.

fraction

Specifies the fraction of offspring to create with the number of atoms mutation variation. If the entire Variations block is not specified, then fraction defaults to 0.1 if mutating the number of atoms would not always violate the minimum and maximum number of atoms constraints (see Constraints), and it defaults to 0.0 if it would. Otherwise, it is not optional. Furthermore, the values of the fraction keywords of all specified variations (could include Mating, StructureMut, NumAtomsMut and Permutation) must sum to 1.0.

mu_num_adds

Specifies the mean of the Gaussian distribution from which the number of atoms (or number of stoichiometries' worth of atoms, for fixed-composition searches) to add to (or remove from) a parent structure is drawn. Optional, and defaults to 0.0.

sigma_num_adds

Specifies the standard deviation of the Gaussian distribution from which the number of atoms (or number of stoichiometries' worth of atoms, for fixed-composition searches) to add to (or remove from) a parent structure is drawn. Optional, and defaults to 1.0.

scale_volume

Specifies whether to scale the volume per atom of an offspring structure to equal that of its parent. Only applied if the offspring was generated by adding atoms to the parent. Optional, and defaults to True.

Go back to Contents

Permutation

Variations:
    Permutation:
        fraction: <float>
        mu_num_swaps: <float>
        sigma_num_swaps: <float>
        pairs_to_swap:
            - <string> <string>
            - <string> <string>
            - ...

The Permutation keyword specifies the parameters associated with the permutation variation.

fraction

Specifies the fraction of offspring to create with the permutation variation. If the entire Variations block is not specified, then fraction defaults to 0.1 if the composition space contains pairs of atoms whose electronegativities differ by 1.1 or less, and it defaults to 0.0 if the composition space contains no such pairs of atoms. Otherwise, it is not optional. Furthermore, the values of the fraction keywords of all specified variations (could include Mating, StructureMut, NumAtomsMut and Permutation) must sum to 1.0.

mu_num_swaps

Specifies the mean of the Gaussian distribution from which the number of swaps is drawn. Optional, and defaults to 2.0.

sigma_num_swaps

Specifies the standard deviation of the Gaussian distribution from which the number of swaps is drawn. Optional, and defaults to 1.0.

pairs_to_swap

Specifies the pairs of distinct atom types that can be swapped. The symbols of atom types to swap are given (separated by a space) on the subsequent lines, where each pair of symbols is preceded by a dash. Optional, and defaults to all the possible pairs of distinct atom types.

Below is an example Variations block containing all the default parameter values.

Variations:
    Mating:
        fraction: 0.7
        mu_cut_loc: 0.5
        sigma_cut_loc: 0.5
        shift_prob: 1.0
        rotate_prob: 1.0
        doubling_prob: 0.1
        grow_parents: True
        merge_sites: 1.0
    StructureMut:
        fraction: 0.1
        frac_atoms_perturbed: 1.0
        sigma_atomic_coord_perturbation: 0.5
        max_atomic_coord_perturbation: 5.0
        sigma_strain_matrix_element: 0.2
    NumAtomsMut:
        fraction: 0.1
        mu_num_adds: 0.0
        sigma_num_adds: 1.0
        scale_volume: True
    Permutation:
        fraction: 0.1
        mu_num_swaps: 2.0
        sigma_num_swaps: 1.0
        pairs_to_swap:
            - Al Cu

Go back to contents

Development

Development:
    niggli: <boolean>
    scale_density: <boolean>

The Development keyword indicates whether the algorithm applies certain operations to structures during development, which occurs directly before and after a structure's energy is calculated. The entire block is optional.

niggli

Specifies whether Niggli cell reduction is applied to structures during development, both before and after an energy calculation. Optional, and defaults to True.

When searching for non-bulk structures (see the Geometry keyword), Niggli cell reduction is done in such a way as to preserve the geometry of the structures, or else omitted altogether. See here for details on searching for non-bulk structures, including how Niggli cell reduction is done in these cases.

scale_density

Specifies whether the volume of a structure is scaled during development before its energy is calculated. The volume scaling is done in such a way that the lattice vector angles and length ratios are preserved. Structures' volumes are not scaled after their energies are calculated, regardless of the value passed to scale_density. Optional, and defaults to True.

For fixed-composition searches, a structure's volume is scaled to match the average volume per atom of the structures in the promotion set of the pool (i.e., the best few structures seen by the algorithm so far. See the Pool keyword).

For phase diagram searches, a structure's volume is scaled to match the weighted average volume per atom of the structures on the current convex hull into which the structure would decompose, based on its composition.

When searching for non-bulk structures (see the Geometry keyword), no volume scaling is performed, regardless of the value passed to the scale_density. See here for details on searching for non-bulk structures.

Go back to Contents

Constraints

Constraints:
    min_num_atoms: <integer>
    max_num_atoms: <integer>
    max_interface_atoms: <integer>
    min_lattice_length: <float>
    max_lattice_length: <float>
    max_scell_lattice_length: <float>
    min_lattice_angle: <float>
    max_lattice_angle: <float>
    allow_endpoints: <boolean>
    per_species_mids:
        <string> <string>: <float>
        <string> <string>: <float>
        ...

The Constraints keyword specifies the constraints placed on structures considered by the algorithm, and the entire block is optional. Lengths are in Angstroms and angles are in degrees.

min_num_atoms

Specifies the minimum number of atoms allowed in the cell. Optional, and defaults to 2. If specified, must be greater than or equal to 2.

max_num_atoms

Specifies the maximum number of atoms allowed in the cell. Optional, and defaults to 30.

For fixed-composition searches, the range defined by the min_num_atoms and max_num_atoms constraints must contain at least one integer multiple of the number of atoms in the formula unit of the composition.

max_interface_atoms

Specifies the maximum number of atoms for an interface geometry organism. Matching an organism's lattice with that of a substrate often requires taking supercells of the organism, increasing the number of atoms. This is a limit on those interface structures after lattice matching. Optional, defaults to 50.

min_lattice_length

Specifies the minimum allowed lattice vector length. Optional, and defaults to 0.5.

max_lattice_length

Specifies the maximum allowed lattice vector length. Optional, and defaults to 20.0.

max_scell_lattice_length

Only applies to the interface geometry. SSpecifies the maximum allowed lattice vector length of the interface structure made by lattice matching. Optional, and defaults to 30.0.

min_lattice_angle

Specifies the minimum allowed lattice angle. Optional, and defaults to 40.0.

max_lattice_angle

Specifies the maximum allowed lattice angle. Optional, and defaults to 140.0.

allow_endpoints

Specifies whether or not to allow structures with compositions equivalent to the endpoints of the composition space. Optional, and defaults to True. Only used for phase diagram searches. Note that this constraint is not applied to structures in the initial population.

per_species_mids

Specifies the minimum interatomic distance constraints between different pairs of atom types. If not given, the mid for a pair of atom types is set to 60% of the sum of the radii of the two atoms.

Below is an example Constraints block containing the default values of the constraints, where the default mid's were computed by the algorithm from the atomic radii.

Constraints:
    min_num_atoms: 2
    max_num_atoms: 50
    min_lattice_length: 0.5
    max_lattice_length: 20.0
    max_scell_lattice_length: 30.0
    min_lattice_angle: 40.0
    max_lattice_angle: 140.0
    allow_endpoints: True
    per_species_mids:
        Al Al: 1.716
        Al Cu: 1.626
        Cu Cu: 1.536

Go back to Contents

RedundancyGuard

RedundancyGuard:
    lattice_length_tol: <float>
    lattice_angle_tol: <float>
    site_tol: <float>
    use_primitive_cell: <boolean>
    attempt_supercell: <boolean>
    rmsd_tol: <float>
    epa_diff: <float>

The RedundancyGuard keyword specifies the parameters used by the algorithm to determine whether two structures are equivalent to each other. The entire block is optional.

lattice_length_tol

Specifies the fractional lattice length tolerance. This value is passed to the pymatgen.analysis.structure_matcher.StructureMatcher class as the ltol parameter. Optional, and defaults to 0.05.

lattice_angle_tol

Specifies the lattice angle tolerance, in degrees. This value is passed to the pymatgen.analysis.structure_matcher.StructureMatcher class as the angle_tol parameter. Optional, and defaults to 2.0.

site_tol

Specifies the tolerance for the locations of the atoms in the structure. It is defined as the fraction of the average free length per atom in the structure, where the average free length per atom is the cube root of the average volume per atom of the structure. This value is passed to the pymatgen.analysis.structure_matcher.StructureMatcher class as the stol parameter. Optional, and defaults to 0.1.

use_primitive_cell

Specifies whether structures should be reduced to their primitive cells before they are compared. This value is passed to the pymatgen.analysis.structure_matcher.StructureMatcher class as the primitive_cell parameter. Optional, and defaults to True.

attempt_supercell

Specifies whether the matching algorithm should try to match structures with differing numbers of atoms by taking a supercell of the smaller one. This value is passed to the pymatgen.analysis.structure_matcher.StructureMatcher class as the attempt_supercell parameter. Optional, and defaults to True.

rmsd_tol

Specifies the RMSD difference threshold for whether two clusters are considered different. This value is passed to the pymatgen.analysis.molecule_matcher.MoleculeMatcher class as the tolerance parameter. Only used when searching for clusters or wires (see the Geometry keyword). Optional, and defaults to 0.1.

We have observed that pymatgen's molecule matcher doesn't always identify duplicate clusters, and it also sometimes gives false positives. We have chosen a fairly conservative tolerance for the default to help prevent false positives. To not miss duplicates that aren't identified by the molecule matcher, we recommend using the epa_diff keyword (see below) and setting it to a small nonzero value when searching for clusters or wires.

epa_diff

Specifies the tolerance for comparing structures based on their energies per atom, in addition to their structures. Structures are considered equivalent if the absolute value of the difference between their energies per atom (eV/atom) is less than the value of epa_diff. Optional, and defaults to 0.0 (never equivalent based on energies per atom). For cluster searches, we recommend using a small nonzero value (e.g., 0.00001).

Below is an example RedundancyGuard block containing the default values of the parameters.

RedundancyGuard:
    lattice_length_tol: 0.05
    lattice_angle_tol: 2.0
    site_tol: 0.1
    use_primitive_cell: True
    attempt_supercell: True
    rmsd_tol: 2.0
    epa_diff: 0.0

Go back to Contents

Geometry

Geometry:
    shape: <'bulk'| 'sheet' | 'wire' | 'cluster' | 'interface'>
    min_size: <float>
    max_size: <float>
    padding: <float>

The Geometry keyword specifies the dimensionality of the structures considered by the algorithm, as well as constraints and vacuum padding for non-bulk structures. The entire Geometry block is optional, and if not specified, the algorithm searches for bulk (3D) structures.

shape

Specifies the dimensionality of the structures. Five different values may be given:

The keyword bulk, which specifies that the algorithm should search for bulk (3D) structures. This is the default.
The keyword sheet, which specifies that the algorithm should search for two-dimensional (2D) sheet-like structures.
The keyword wire, which specifies that the algorithm should search for one-dimensional (1D) wire-like structures.
The keyword cluster, which specifies that the algorithm should search for zero-dimensional (0D) clusters.
The keyword interface, which specifies that the algorithm should search for thin (2D) films on a given substrate.

The remaining three keywords in the Geometry block apply only to non-bulk structures and are ignored by the algorithm if searching for bulk structures.

min_size

Specifies the minimum allowed size (in Angstroms) of non-bulk structures. Defaults to -infinity (no constraint).

max_size

Specifies the maximum allowed size (in Angstroms) of non-bulk structures. Defaults to +infinity (no constraint)

padding

Specifies the amount of vacuum padding (in Angstroms) the algorithm adds around non-bulk structures before submitting them for an energy evaluation. Defaults to 10.

interface geometry

The interface geometry utilizes the sheet geometry to make the 2D films which are lattice matched and placed on top of a substrate. Hence, the min_size and max_size parameters correspond to the thickness of the 2D film only, not the entire interface structure. Conversely, parameters in constraints keyword such as max_interface_atoms and max_scell_lattice_length correspond to the entire interface cell.

See here for details on searching for non-bulk structures, including the definition of the size of a non-bulk structure and how the algorithm adds vacuum padding around non-bulk structures.

Go back to Contents

LatticeMatch

LatticeMatch:
    max_area: <integer>
    max_mismatch: <float>
    max_angle_diff: <float>
    r1r2_tol: <float>
    separation: <float>
    align_random: True
    nlayers_substrate: <integer>
    nlayers_2d: <integer>
    sd_layers: <integer>

(Only for interface geometry) The LatticeMatch keyword specifies the constraints to use for the lattice matching algorithm. These parameters control the possibility of finding a lattice matched substrate, the size of interface structures, alignment of 2D film on interface and relaxation of interface structures. All parameters are optional, however, it is recommended to note the parameters used to understand and analyze search results.

max_area

Specifies the maximum allowed surface area for a lattice match. Default is 100 (Å^2)

max_mismatch

Specifies the maximum allowed lattice mismatch between both the lattice vectors in 2D lattices of film and substrate. Default is 0.05 (5 %)

max_angle_diff

Specifies the maximum angle difference between the angle between lattice vectors of both the 2D lattices of film and substrate. Default is 2 (degrees).

r1r2_tol

Specifies the tolerance between the ratio of areas of the primitive slabs of film and substrate. Default is 0.06.

separation

Specifies the vertical distance in Å above the substrate to place the lattice matched 2D film. Default is 2 (Å)

align_random

Aligns the 2D film randomly on the substrate if specified True and nlayers_substrate and nlayers_2d parameters are ignored. Default is True.

nlayers_substrate

Specifies the number of unique layers of substrate slab to consider when aligning the 2D film on top of substrate. Default is 1.

nlayers_2d

Specifies the number of unique layers of 2D film to consider when aligning the 2D film on top of substrate. Default is 1.
The algorithm returns interface with one of the all possible translations of 2D film on substrate by aligning the unique sites in the nlayers_substrate and nlayers_2d

sd_layers

Specifies the number of substrate layers to relax in the interface structure (Specifies the selective dynamics flags for VASP). Default is 1.

Below is an example LatticeMatch block containing the default values of the parameters.

LatticeMatch:
    max_area: 100
    max_mismatch: 0.05
    max_angle_diff: 2
    r1r2_tol: 0.06
    separation: 2
    align_random: True
    nlayers_substrate: 1
    nlayers_2d: 1
    sd_layers: 1

Go back to Contents

Substrate

Substrate:
    E_sub_prim: <float>
    n_sub_prim: <int>
    mu_A: <float>
    mu_B: <float>
    mu_C: <float>

(Only for interface geometry) The Substrate keyword specifies the required values to calculate objective function. It includes the energetics of primitive substrate calculation, chemical potentials of each species in the 2D film.

E_sub_prim

Specifies the total energy of the primitive substrate slab calculation. Use the same sd_layers while relaxing the primitive substrate. This parameter is mandatory for interface geometry.

n_sub_prim

Specifies the number of atoms in primitive substrate slab. This parameter is mandatory for interface geometry.

mu_A

Specifies the reference chemical potential of species A. This parameter is mandatory for interface geometry.

Note: Choose species A, B and C in the increasing order of their electronegativities. That is χ_A < χ_B < χ_C

mu_B

Specifies the reference chemical potential of species A. If there are two species in CompositionSpace, this parameter is mandatory for interface geometry.

mu_C

Specifies the reference chemical potential of species A. If there are three species in CompositionSpace, this parameter is mandatory for interface geometry.

Go back to Contents

job_specs

job_specs:
    cores: <int>
    memory: <string>
    project: <string>
    queue: <string>
    walltime: <string>
    interface: <string>
    job_extra:
        - <string>
        - <string>
        -

The job_specs keyword specifies the details of the cluster (SLURM/PBS) job that runs each organism calculation (VASP/LAMMPS). NumCalcsAtOnce worker jobs, each with the job specifications mentioned here will run in parallel.

cores

Specifies the number of cores. Analogous to #SBATCH --cpus-per-task option.. Optional, default is 1

memory

Specifies the total memory for the worker job. Analogous to #SBATCH --mem option. Optional, default is '8GB'

project

Specifies the project name/id on the cluster to submit job on scheduler. Analogous to #SBATCH -A option. This is mandatory.

queue

Specifies the queue name for the job on scheduler. Analogous to #SBATCH -p option. This is mandatory.

walltime

Specifies the total walltime for the job. This should be large enough that spans over the entire GASP runtime. Analogous to #SBATCH -t option. Default is '24:00:00'

interface

Specifies the type of node (infiniband, haswell etc). Default is 'ib0'

job_extra

Specify any extra options for the job script each mentioned as a separate item in string format. (Ex: '--ntasks=4')

Below is an example job_specs block containing the default values of the parameters.

job_specs:
    cores: 1
    memory: '8GB'
    project: <mandatory>
    queue: <mandatory>
    walltime: '24:00:00'

Refer to documentation of dask_jobqueue for more information.

Go back to Contents

StoppingCriteria

StoppingCriteria:
    num_energy_calcs: <integer>
    epa_achieved: <float>
    found_structure: <string>

The StoppingCriteria keyword specifies when the genetic algorithm should terminate the search. The entire block is optional, and the default stopping criteria for fixed composition, binary phase diagram, ternary phase diagram and quaternary phase diagram searches are 800, 1000, 3000 and 6000 energy calculations, respectively.

num_energy_calcs

Specifies that the algorithm should terminate when the given number of energy calculations have been done. This is the default behavior. Optional.

epa_achieved

Specifies that the algorithm should terminate when a structure has been found with an energy per atom less than or equal to the given amount (in eV/atom). Optional, and only valid for fixed-composition searches. For phase diagram searches, this keyword is ignored.

found_structure

Specifies that the algorithm should terminate when a structure has been found that matches the given structure. The value passed to this keyword is the path to the file containing the given structure. The structure file must be in POSCAR or CIF format and begin with 'POSCAR.' or end with '.cif', respectively. Optional.

Go back to Contents

Output

Output to disk

The primary output of the algorithm is placed into a directory structure. The root of this structure can be specified with the RunTitle keyword in the main input file. If the RunTitle keyword is not specified, then the root directory is named 'garun'. If a directory called 'garun' already exists, then the root directory is named 'garun_date_time', where date_time is the current date and time.

Inside this directory are a number of POSCAR files, each of which holds a single structure. They are named after their organism ID. These are written after each organism has been added to the population, so the structures are relaxed and satisfy the constraints.

Also in this directory are files named 'ga_parameters' and 'run_data'. The 'ga_parameters' file contains all the parameters used by the algorithm during the search, including default values for optional parameters that weren't specified in the main input file. It is written in the same format as the main input file, and itself can be used as an input file for the algorithm.

The 'run_data' file contains information about the organisms seen by the algorithm during the search. After each organism is added to the population, the 'run_data' file is appended with a line of the form:

org_id    composition    total_energy    epa    num_calcs    best_value

where 'org_id' is the organism ID, 'composition' is the organism's chemical formula, 'total_energy' is the organism's energy, 'epa' is the organism's energy per atom, 'num_calcs' is the number of energy calculations that had been completed when this organism's energy calculation finished, and 'best_value' is the best value that the algorithm had seen when this organism's energy calculation finished. For fixed composition searches, 'best_value' corresponds to the lowest energy per atom (eV/atom). For phase diagram searches, 'best_value' corresponds the largest area or volume of the convex hull.

There is a subdirectory inside the output directory called 'temp'. It contains files used by the algorithm to communicate with external energy codes. In particular, it contains a subdirectory for each organism, named after its ID, containing the input and output files of the external code used to compute the energy of the organism.

Go back to Contents

Output to screen

Much important information about the progress of the algorithm is written to screen. This text is not saved elsewhere, but it may be captured with, for example, the tee utility on the UNIX command line:

run.py input_file.yaml 2>&1 | tee ga_output

This saves the screen output to a file called 'ga_output'. For important runs, it is recommended to save the screen output to disk for possible future analysis.

Go back to Contents

Visualizing output

The POSCAR files of individual structures can be viewed with software such as Vesta (they may need a '.vasp' suffix).

Several Python scripts that take the 'run_data' file (see Output to disk) as an argument and make plots are added to your path when GASP is installed.

The progress of the algorithm can be visualized by plotting the best value seen by the algorithm so far (either the lowest energy per atom or the largest area/volume of the convex hull) against the number of completed energy calculations. A Python script called plot_progress.py creates such a plot using the matplotlib plotting library. To make the plot, type:

plot_progress.py run_data

It is sometimes useful to visualize the numbers of atoms in the structures generated by the algorithm. A Python script called plot_system_size.py makes a plot of the number of atoms in each structure versus the number of completed energy calculations using pymatgen and the matplotlib plotting library. To make the plot, type:

plot_system_size.py run_data

For phase diagram searches, we often want to view the phase diagram predicted by the algorithm. A Python script called plot_phase_diagram.py makes a phase diagram plot using pymatgen and the matplotlib plotting library. To make the plot, type:

plot_phase_diagram.py run_data

Note: for Mac OS X, maplotlib requires a framework build of Python. Instructions for how to install a framework Python build into a conda environment are given in step 3 of the 'Getting GASP' section of the README file. To call the framework Python, use pythonw instead of python. So to make a progress plot, for example, the command would be:

pythonw /path_to_repo/gasp/scripts/plot_progress.py run_data

where 'path_to_repo' is the path to the GASP-python repository on your computer. All the plotting scripts are located in GASP-python/gasp/scripts.

Go back to Contents

Energy Code Interfaces

GASP is interfaced to GULP, LAMMPS and VASP.

GULP

In order to do a GULP calculation, the genetic algorithm runs the command

callgulp <path to gulp input file>

The callgulp command is a small script that should be on the user's path. Its purpose is to avoid the need to hardcode details of the GULP installation into the genetic algorithm. On most machines, the following callgulp bash script should work:

#!/bin/bash
# script to run gulp
# usage: callgulp /path/to/gulp/input/file

# get the name of the input file, without the path
input_file=$(basename $1)

# get the path to the job directory containing the input file
dir_path=$(dirname $1)

# move into the job directory
cd $dir_path

# run gulp on the input file
gulp < $input_file

This script assumes that GULP is installed, and that the GULP executable (gulp command) is on the user's path.

For each GULP calculation, the genetic algorithm writes the GULP input and output to files called id.gin and id.gout, respectively, where id is the ID number of the organism.

The genetic algorithm needs two files in order to perform GULP calculations: a header file and a potential file. The input files for GULP (id.gin files) are generated in the format:

<contents of gulp header file>
cell
<a> <b> <c> <alpha> <beta> <gamma>
frac
<atomic symbol 1> core <three fractional coordinates>
<atomic symbol 2> core <three fractional coordinates>
...
<contents of gulp potential file>

The contents of the header and potential files passed to the algorithm are inserted where indicated. Lines specifying the locations of species' shells are included if needed.

One should refer to the GULP manual for all the options available, but a GULP header file might look like the following:

opti conp conj
switch_minimiser bfgs gnorm 0.02

This tells GULP to optimize both the lattice parameters and atomic positions of the structure under constant pressure. It will initially use a conjugate-gradient minimization routine but will switch to the Newton-Raphson BFGS method after the gradient norm falls below 0.2. This strategy of switching minimizers is useful for many systems.

Similarly, there are many ways to give GULP an empirical potential. As an example, a GULP potential file might look like the following:

lennard
C core C core 100470 64.464 0.000 8.5125

This is the Lennard-Jones potential used by Abraham and Probert to model carbon (https://doi.org/10.1103/PhysRevB.73.224104).

Non-Bulk Structures

As discussed here, one or more lattice vectors must be fixed when searching for non-bulk structures. To achieve this with GULP, the genetic algorithm writes the appropriate flags to the GULP input file, which specify which lattice vector lengths and angles to fix. In this case, the GULP input file written by the algorithm has the form:

<contents of gulp header file>
cell
<a> <b> <c> <alpha> <beta> <gamma> <1|0> <1|0> <1|0> <1|0> <1|0> <1|0>
frac
<atomic symbol 1> core <three fractional coordinates> 1 1 1
<atomic symbol 2> core <three fractional coordinates> 1 1 1
...
<contents of gulp potential file>

For bulk searches, the keyword conp is usually used in the GULP header file. However, this keyword, along with cellonly and conv, must not appear in the GULP header file for non-bulk searches, since these will cause GULP to ignore the individual flags written by the algorithm.

See the Geometry keyword for details on how to specify a non-bulk structure search in the input file of the genetic algorithm.

See here for details on how the algorithm treats non-bulk structures.

Go back to Contents

LAMMPS

In order to do a LAMMPS calculation, the genetic algorithm runs the command

calllammps <path to lammps input file>

The calllammps command is a small script that should be on the user's path. Its purpose is to avoid the need to hardcode details of the LAMMPS installation into the genetic algorithm. On most machines, the following calllammps bash script should work:

#!/bin/bash
# script to run lammps
# usage: calllammps /path/to/lammps/input/file

# get the name of the input file, without the path
input_file=$(basename $1)

# get the path to the job directory containing the input file
dir_path=$(dirname $1)

# move into the job directory
cd $dir_path

# run lammps on the input file
lammps < $input_file

This script assumes that LAMMPS is installed, and that the LAMMPS executable (lammps command) is on the user's path.

The genetic algorithm needs an input script in order to perform LAMMPS calculations. Here is an example input script that specifies an EAM potential for Al-Cu alloys:

units		metal
dimension	3
atom_style	charge
boundary	p p p
read_data	in.data

pair_style eam/alloy
pair_coeff * * <path to AlCu.eam.alloy file> Al Cu

neigh_modify one 5000
minimize 0.0 1.0e-8 1 1
fix 1 all box/relax tri 1e4 vmax 0.001
minimize 0.0 1.0e-8 1000000 10000000
dump myDump all atom 100000000000000 dump.atom
dump_modify myDump sort 1 scale no
fix 1 all box/relax tri 0 vmax 0.001
minimize 0.0 1.0e-8 1000000 10000000

The input script given to the algorithm must contain the following lines:

atom_style charge

and

read_data in.data

In addition, the input script must use the dump command to print the relaxed structure to a file called dump.atom, from which the algorithm parses the relaxed structure.

The pair_style and pair_coeff commands specify the empirical potential to use in the energy calculations. Note that the element symbols must appear at the end of line containing the pair_coeff command, as shown above. In most cases, the user must only replace these two lines to specify the desired potential, and this example input script should work.

We found that using an initial relaxation step with the system under small pressure (as shown here) is helpful for ensuring relaxation with many empirical potentials.

The energy landscapes defined by empirical potentials sometimes contain severely unphysical minima. To prevent such minima from derailing a structure search, the algorithm checks that the energy per atom computed by LAMMPS for each structure is not less than -50 eV/atom. If it is, the algorithm discards the structure and prints a message: Discarding organism n due to unphysically large energy: x eV/atom, where n is the organism number and x is the energy per atom. If GASP is being used to evaluate an empirical potential, problematic structures are likely to be important and can be found by checking for these messages in the output.

Non-Bulk Structures

As discussed here, one or more lattice vectors must be fixed when searching for non-bulk structures. To achieve this with LAMMPS, the user must slightly modify the LAMMPS input script. In particular, the fix command must be changed so that only the specified lattice vectors are allowed to relax, while the others are fixed. The boundary command may also be changed for non-bulk calculations. Here is an example LAMMPS input script for two-dimensional sheets:

units	 	metal
dimension	3
atom_style	charge
boundary	p p f
read_data	in.data

pair_style eam/alloy
pair_coeff * * <path to AlCu.eam.alloy file> Al Cu

neigh_modify one 5000
minimize 0.0 1.0e-8 1 1
fix 1 all box/relax x 0 y 0 xy 0 vmax 0.001
minimize 0.0 1.0e-8 1000000 10000000
dump myDump all atom 100000000000000 dump.atom
dump_modify myDump sort 1 scale no
fix 1 all box/relax x 0 y 0 xy 0 vmax 0.001
minimize 0.0 1.0e-8 1000000 10000000

The options following the boundary command tell LAMMPS to use periodic boundary conditions in the x and y directions, but not in the z direction. The options following the fix command specify that only the a and b lattice vectors, and the angle between them, are to be relaxed.

Here is an example LAMMPS input script for one-dimensional wires:

units		metal
dimension	3
atom_style	charge
boundary	f f p
read_data	in.data

pair_style eam/alloy
pair_coeff * * <path to AlCu.eam.alloy file> Al Cu

neigh_modify one 5000
minimize 0.0 1.0e-8 1 1
fix 1 all box/relax z 0 vmax 0.001
minimize 0.0 1.0e-8 1000000 10000000
dump myDump all atom 100000000000000 dump.atom
dump_modify myDump sort 1 scale no
fix 1 all box/relax z 0 vmax 0.001
minimize 0.0 1.0e-8 1000000 10000000

The options following the boundary command tell LAMMPS to use periodic boundary conditions in the z direction, but not in the x and y directions. The options following the fix command specify that only the c lattice vector is to be relaxed.

For zero-dimensional clusters, no periodic boundary conditions are needed, and the fix command can simply be omitted to prevent all three lattice vectors from being relaxed:

units		metal
dimension	3
atom_style	charge
boundary	f f f
read_data	in.data

pair_style eam/alloy
pair_coeff * * <path to AlCu.eam.alloy file> Al Cu

neigh_modify one 5000
minimize 0.0 1.0e-8 1 1
minimize 0.0 1.0e-8 1000000 10000000
dump myDump all atom 100000000000000 dump.atom
dump_modify myDump sort 1 scale no
minimize 0.0 1.0e-8 1000000 10000000

See the Geometry keyword for details on how to specify a non-bulk structure search in the input file of the genetic algorithm.

See here for details on how the algorithm treats non-bulk structures.

Go back to Contents

VASP

In order to do a VASP calculation, the genetic algorithm runs the command

callvasp <path to vasp job directory>

The callvasp command is a small script that should be on the user's path. Its purpose is to avoid the need to hardcode details of the VASP installation into the genetic algorithm. Here is an example callvasp bash script:

#!/bin/bash
# script to run vasp
# usage: callvasp /path/to/vasp/job/directory/

# move into the vasp job directory
cd $1

# run vasp
vasp

Where the last line vasp is replaced by whatever command is needed to run VASP on the user's machine.

The genetic algorithm needs an INCAR file, a KPOINTS file, and a POTCAR file for each element in the composition space in order to do a VASP calculation. See the EnergyCode keyword for how to provide these files to the algorithm in the input file.

TODO: talk about a more useful script that resubmits the VASP job up to some number of times

TODO: talk about how to make the callvasp script if each vasp calculation cannot be run directly, but has to be submitted to a queue as a job (like SLURM).

Non-Bulk Structures

As discussed here, one or more lattice vectors must be fixed when searching for non-bulk structures. However, VASP does not support selectively fixing certain lattice vectors as an option in the INCAR file. In order to fix specific lattice vectors during a VASP relaxation, the constr_cell_relax.F file must be modified to set some of the components of the stress tensor to zero, and VASP must be recompiled with the modified file.

To prevent the c lattice vector from being relaxed (for two-dimensional/sheet searches), the constr_cell_relax.F file should be modified to look like this:

SUBROUTINE CONSTR_CELL_RELAX(FCELL)
USE prec
REAL(q) FCELL(3,3)

DO I=1,3
   FCELL(3,I)=0
   FCELL(I,3)=0
ENDDO

RETURN
END SUBROUTINE

To prevent the a and b lattice vectors from being relaxed (for one-dimensional/wire searches), the constr_cell_relax.F file should be modified to look like this:

SUBROUTINE CONSTR_CELL_RELAX(FCELL)
USE prec
REAL(q) FCELL(3,3)

SAVE=FCELL(3,3)
FCELL=0
FCELL(3,3)=SAVE

RETURN
END SUBROUTINE

To prevent all the lattice vectors from being relaxed (for zero-dimensional/cluster searches), set ISIF = 2 in the INCAR file. No change to the constr_cell_relax.F file is needed.

See the Geometry keyword for details on how to specify a non-bulk structure search in the input file of the genetic algorithm.

See here for details on how the algorithm treats non-bulk structures.

Go back to Contents

Non-Bulk Geometries

In addition to searching for bulk structures, GASP supports searching for structures with non-bulk geometries, including two-dimentional sheets, one-dimensional wires, and zero-dimensional clusters. Below are the details of how GASP deals with each case. The basic idea is to add vacuum in the non-periodic direction(s) and to fix the corresponding lattice vectors during relaxation to prevent the non-bulk structures from interacting with their periodic images.

See the Geometry keyword for the parameters needed in the input file to search for structures with non-bulk geometries.

Layers/2D Materials

GASP represents a 2D structure as follows. First, the structure is rotated into the principal directions (lattice vector a is parallel to the Cartesian x-axis, lattice vector b lies in the Cartesian x-y plane, and the z-component of lattice vector c is positive). The c lattice vector is then replaced with its z-component. At this point, the 2D structure lies is in the x-y plane, and c is perpendicular to the 2D sheet.

Before passing a 2D structure to an external code for an energy calculation, GASP adds vertical vacuum padding by increasing the magnitude of the c lattice vector. The amount of vacuum added is determined by the value given after the padding keyword in the Geometry block: padding is added until the minimum vertical distance between an atom in the structure and another atom in its vertical periodic image is the value following the padding keyword.

After an energy calculation, the algorithm removes the vertical vacuum padding, leaving only enough to satisfy the per-species minimum interatomic distance constraints.

For 2D structures, the size keyword in the Geometry block refers to the layer thickness of the 2D structure, which is the maximum vertical distance between atoms in the cell (assuming the 2D structure lies in the x-y plane, as described above).

During structural relaxations with an external energy code, the c lattice vector should not be permitted to change when doing 2D structure searches. See the Energy Code Interfaces section for details on how to accomplish this with the various energy codes that can be used with GASP.

Go back to Contents

Wires

GASP represents a 1D structure as follows. First, the structure is rotated so that the c lattice vector lies parallel to the Cartesian z-axis. The a and b lattice vectors are then replaced with their components along the Cartesian x and y axes, respectively. At this point, each of the three lattice vectors lie parallel to one of the Cartesian directions, and the c lattice vector corresponds to the direction of periodicity of the 1D wire.

Before passing a 1D structure to an external code for and energy calculation, GASP adds vacuum padding around the wire by increasing the magnitude of the a and b lattice vectors (which are parallel to the Cartesian x and y axes, respectivley.) The amount of vacuum added is determined by the value given after the padding keyword in the Geometry block: padding is added until the minimum distance between an atom in the structure and another atom in its periodic image (in either the x or y direction) is the value following the padding keyword.

After an energy calculation, the algorithm removes the vacuum padding in the x and y directions, leaving only enough to satisfy the per-species minimum interatomic distance constraints.

For 1D structures, the size keyword in the Geometry block refers to the thickness of the 1D wire. The thickness is defined as the maximum distance between the atoms' projections onto the x-y plane (assuming the 1D structure lies along the z axis, as described above).

During structural relaxations with an external energy code, the a and b lattice vectors should not be permitted to change when doing 1D structure searches. See the Energy Code Interfaces section for details on how to accomplish this with the various energy codes that can be used with GASP.

Go back to Contents

Clusters/Molecules

GASP represenets a 0D cluster as follows. First, the maximum distance between atoms in each Cartesian direction is computed. The lattice vectors are then made parallel to each of the three Cartesian directions, and their magnitudes are set to the maximum distances between atoms in each direction. At this point, the cluster lies inside an orthorhombic cell that is just large enough to contain all the atoms.

Before passing a 0D structure to an external code for an energy calculation, GASP adds vacuum padding around the cluster by increasing the magnitude of all three lattice vectors. The amount of vacuum that is added is determined by the value given after the padding keyword in the Geometry block: padding is added until the minimum distance between an atom in the structure and another atom in its periodic image (in the x or y or z direction) is the value following the padding keyword.

After an energy calculation, the algorithm removes the vacuum padding in all three Cartesian directions, leaving only enough to satisfy the per-species minimum interatomic distance constraints.

For 0D structures, the size keyword in the Geometry block refers to the diameter of the 0D cluster, which is the maximum distance between atoms in the cluster.

During structural relaxations with an external energy code, none of the lattice vectors should be permitted to change when doing 0D structure searches. See the Energy Code Interfaces section for details on how to accomplish this with the various energy codes that can be used with GASP.

Go back to Contents

Resuming Calculations

TODO

Go back to Contents

Strategies

TODO

Go back to Contents