# A practical guide

This guide was assembled from information provided by Steven Brown, Paul Hodgkinson, the CASTEP manual and the CASTEP mailing list.

Two main aspects of CASTEP are typically used in the context of NMR calculations: geometry optimisation and the calculation of magnetic resonance parameters. Geometry optimisation produces a model structure which is then used to compute magnetic shieldings of individual nuclei.

CASTEP calculations are controlled by two files of the same <seedname>. The .cell file contains the information about the unit cell (lattice parameters, atomic coordinates, symmetries, constraints) and parameters controlling the sampling of k­-space (the reciprocal space of the crystal lattice), while the .param file specifies what kind of calculation will be carried out (geometry optimisation, NMR calculation) and the various parameters controlling the DFT side of things.

## Geometry optimisation

The geometry optimisation is a procedure where the quantum mechanical interaction energy between the atoms and electrons is minimised as a function of the positions of the atoms and the lattice parameters. The forces, and optionally, the stresses are computed in each step, and the atomic coordinates are moved such that the total energy of the system decreases. This is repeated until a set of predefined convergence criteria are satisfied.

It is important to note that the resulting structure may only represent a local minimum on the potential energy surface, and it is determined by the starting structure as well as the parameters used in the calculation. Also, the structure corresponds to the classical temperature of 0 K. Finite temperatures and quantum mechanical effects might mean that a whole distribution of structures can be relevant.

As a rough guide, each electronic structure calculation step scales as the cube of the number of valence electrons (i.e. treated explicitly) in the system. The number of geometry optimisation steps needed to reach convergence depend on the quality of the initial structure and the shape of the potential energy surface, some soft systems might be difficult to converge.

## Isolated molecules

CASTEP treats every system as periodic by applying periodic boundary conditions. To estimate the electronic structure and the NMR parameters of an isolated molecule in vacuum, it is necessary to pad the molecule with empty space in all directions. This can be accomplished by using the same absolute coordinates and increasing the dimensions of the unit cell, see here. If optimising the geometry of the isolated molecule, it is important to keep the cell fixed.

When enlarging the cell, it is important to remember that the molecule is still treated as a crystal, but each molecule is separated by vacuum. The implication is that the old periodicity of the atomic coordinates is not correct anymore - it is important to make sure the molecule has not been chopped! Also, check how the results converge with the cell size i.e. amount of vacuum. An orthorhombic or even cubic cell might be used.1

There is an optimal k-point sampling for single isolated molecules, described here.

## NMR calculations

With the geometry optimisation completed, the optimised structure can be employed for a subsequent calculation, in this case the calculation of solid state NMR parameters. The easiest way to obtain the relaxed structure is to specify WRITE_CELL_STRUCTURE : true in the .param file, so an -out.cell file is written at the end of the geometry optimisation run, see also here.

CASTEP performs an electronic structure calculation, and uses the electronic wavefunction and the density to compute the NMR parameters.

Sometimes it is interesting to estimate the effect of the crystalline arrangement as compared to isolated molecules. To compare the chemical shielding for a molecule in a crystal with the single molecule, the lattice parameters must be edited. The atomic coordinates are kept, but the dimensions of the unit cells increased, see here. If there are multiple, not equivalent molecules in the crystal, they need to be computed separately.

## Preparing input structures

The basis for the geometry optimisation is a model structure as starting point for the optimisation. This is usually .CIF-file obtained by X-ray or neutron diffraction experiments. It contains information about the structure such as the unit cell dimensions, symmetry operations, atom positions etc. For a previously measured crystal the structure can most likely be found online in the CSD (Cambridge Structural Database) or ICSD (Inorganic Crystal Structure Database), where you can search for compounds and export the corresponding .CIF file.

### Working with .CIF files

Load the CIF file in Mercury or Jmol or GDIS and inspect the structure carefully. Watch out for any chemically odd connectivities, bonds or missing atoms2 and check if there is any disorder (fractional occupancy) in the molecule. Co-crystallised solvent molecules should not be removed as these influence the structure as well as the magnetic shieldings. If the structure is reasonable, save the complete packed unit cell3.

It might be possible to convert to primitive unit cell. Possible tools include Materials Studio and Avogadro. The advantage of converting is that the primitive unit cell may include significantly fewer atoms and thus reduce the necessary computational effort. It is important to check that this has not altered the general molecular or crystal structure. CASTEP can detect if a supercell of a primitive unit cell is used - it is useful to perform a "dry run" and check this.

cif2cell converts .cif structures to .cell files and by default it creates the primitive cell. To obtain the conventional unit cell, use the --no-reduce option when running cif2cell. Note that cif2cell assumes that values such as 0.333333 mean ⅓, and silently corrects the decimal so it corresponds to the machine-precision representation of the fraction, e.g. 0.33333333333, which may not be correct.

Finally, it is also possible to create a .cell file manually.

### .CELL files

An example .cell file:

%block LATTICE_ABC
22.03307 22.03307 22.03307
109.47122 109.47122 109.47122
%endblock LATTICE_ABC

%block POSITIONS_ABS
Zn        10.79944       -0.30702        0.53178
P         -8.48949        4.97939       11.94046
H          8.80735       13.05686       12.54438
H          9.37434       -2.92380       14.44052
%endblock POSITIONS_ABS

The LATTICE_ABC block contains the cell dimensions (a, b, c) in the first line, and the cell angles (α,β,γ) in the second.

The POSITIONS_ABS block contains the species types and coordinates. Alternatively, the POSITIONS_FRAC block may be used for specifying fractional coordinates.

##### Further keywords
• k-point sampling parameters. CASTEP approximates the wavefunction of an infinitely large, periodic crystal by using a discrete grid of points, called k-points, in reciprocal space. As more points are used4, this approximation becomes better, therefore convergence of the results should be checked with respect to k-point sampling.

• KPOINTS_MP_SPACING: The k-point density of a Monkhorst-Pack grid. Units of inverse length should be specified. The defaulf value of 0.1 1/ang works well in general for molecular crystals. Note that the actual value specifies a minimum density only, as the actual value changes discretely. Best practice is to check the resulting grid in the .castep file because it is possible that different values correspond to the same grid.
• KPOINTS_MP_OFFSET: The offset of the origin of the Monkhorst-Pack set in fractional coordinates.
• KPOINTS_LIST: Explicit list of k-points (coordinates in reciprocal space) in the Brillouin zone (with associated weights as the fourth element) used for Brillouin zone integration. The k-point weights must sum to 1. If computing a single, isolated molecule (that is, a molecule in a large cell, padded by vacuum), it is found that using a single k-point situated at (¼,¼,¼) leads to faster convergence of the results with respect to cell size. To accomplish this, the following can be added to the .cell file:

%BLOCK KPOINTS_LIST
0.25    0.25    0.25    1.00
%ENDBLOCK KPOINTS_LIST
• SNAP_TO_SYMMETRY: If SNAP_TO_SYMMETRY keyword is present then, given the symmetry operations supplied or generated in cell, the ionic positions and lattice parameters are forced to obey the symmetries (to machine precision). If using symmetries, this option should always be used.
• SYMMETRY_GENERATE: If present, then the crystal symmetry rotations and translations will be generated automatically, reducing the computational complexity as CASTEP can utilise the these. Do not use this keyword if the symmetry operations are already present (for example if the .cell file is the output of CASTEP or cif2cell. In some cases, it is desirable not to enforce symmetries, such as molecular dyanmics runs or in geometry optimisations when the structure might relax into a different symmetry.
• FIX_ALL_CELL: If set to TRUE, all lattice parameters will remain fixed during a geometry optimisation or MD run.
• FIX_COM: If set to TRUE, the centre of mass of the ions will remain fixed during a geometry optimisation or molecular dynamics simulation.
• IONIC_CONSTRAINTS : A list of the constraints on the motion of the ions during relaxation or molecular dynamics. Each line in this block will apply a linear constraint on the Cartesian coordinates of an atom.

• The first element in the line identifies the constraint.
• The second element is the species identifier, either the atomic symbol or atomic number
• The third element is the number of the atom within the species - using the same ordering as in the POSITIONS_ABS or POSITIONS_FRAC block
• The final three elements specify the coefficients of the linear constraint.
! Example to fix the Zn and the P atoms in a cell
%BLOCK IONIC_CONSTRAINTS
1  Zn  1  1.0 0.0 0.0
2  Zn  1  0.0 1.0 0.0
3  Zn  1  0.0 0.0 1.0
4  P   1  1.0 0.0 0.0
5  P   1  0.0 1.0 0.0
6  P   1  0.0 0.0 1.0
%ENDBLOCK IONIC_CONSTRAINTS

From CASTEP 17, there exists a simpler mechanism to fix atoms. It can be used to fix specific atoms, or all atoms in a given species. First a fix command is applied, and then, optionally, an unfix command. Examples:

! fix all atoms
%BLOCK IONIC_CONSTRAINTS
fix: all
%ENDBLOCK IONIC_CONSTRAINTS
! fix all atoms except hydrogens
%BLOCK IONIC_CONSTRAINTS
fix: all unfix: H
%ENDBLOCK IONIC_CONSTRAINTS
! fixes carbons and nitrogens
%BLOCK IONIC_CONSTRAINTS
fix: C N
%ENDBLOCK IONIC_CONSTRAINTS
! fixes the first carbon atom
%BLOCK IONIC_CONSTRAINTS
fix: C 1
%ENDBLOCK IONIC_CONSTRAINTS
! fixes the following carbon atoms: 1, 3, 5-11
%BLOCK IONIC_CONSTRAINTS
fix: C {1,3,5-11}
%ENDBLOCK IONIC_CONSTRAINTS
! fix everything except the first and second hydrogen
%BLOCK IONIC_CONSTRAINTS
fix: all unfix: H {1,2}
%ENDBLOCK IONIC_CONSTRAINTS

### .PARAM files

Example .param files:

TASK                    : GEOMETRYOPTIMISATION
CUT_OFF_ENERGY          : 700 eV
XC_FUNCTIONAL           : PBE
WRITE_CELL_STRUCTURE    : true
TASK                    : MagRes
CUT_OFF_ENERGY          : 700 eV
XC_FUNCTIONAL           : PBE
IPRINT                  : 2

#### Keyword glossary

• TASK: defines what job is being done e.g. GeometryOptimisation for geometry optimisation and MagRes for calculation of magnetic resonance parameters.
• MAGRES_TASK: specifies which magnetic property to calculate. Possible values are SHIELDING, EFG, GTENSOR, HYPERFINE, EPR and JCOUPLING. The value NMR is a shorthand for calculating both SHIELDING and EFG.
• CUT_OFF_ENERGY: determines the cut-off energy for the plane wave basis set used. As well as the k-point sampling, this is the other key parameter determining the quality of the results. The higher the cutoff energy, the better the quality of the calculation (and, naturally, the slower it runs). For geometry optimisation, we tend to use a generic value of 600 eV, but higher values are generally required for MagRes tasks, and the optimal value is determined by the most "difficult" atom type e.g. O or F. Again it is good practice to check the convergence of the results with increasing cutoff energy.
• XC_FUNCTIONAL: the approximation of the exchange-correlation functional to be used. PBE is a good compromise for a large range of systems.
• Parameters controlling the density mixing scheme, which can be tweaked if the SCF energy minimisation is unstable for a particular system5.
• MIXING_SCHEME: determines the mixing scheme to be used in the density mixing procedure within the density mixing scheme. Pulay and Broyden (default) are almost always best.
• MIX_HISTORY_LENGTH: try increasing MIX_HISTORY_LENGTH between 20-30 to increase the "damping". This may lead to slower, but more stable convergence of the electronic structure.
• MIX_CHARGE_AMP: determines the mixing amplitude of the charge density within the density mixing scheme. Smaller is usually more stable, but less than 0.1 is rarely useful.
• MIX_CUT_OFF_ENERGY: determines the cut off energy for the densities within the density mixing scheme. Higher usually gives more robust but slower convergence; 1(default)-4 times plane-wave cut-off is sensible.
• MIX_CHARGE_GMAX: determines the maximum G-vector component of charge density to mix within the density mixing scheme. This controls the range of the Kerker preconditioner in reciprocal space, 1.5(default)-3.0 1/Angstrom is sensible.
• Parameters controlling dispersion correction. Dispersion interactions are crucial in molecular crystals, so it is advisable to turn them on, especially in case of optimising the cell parameters.
• SEDC_APPLY: only use this option for CASTEP 8. true turns on dispersion correction.
• SEDC_SCHEME: The semi-empirical dispersion/van der Waals correction scheme to use. Default is NONE, other possible values are G06 or TS, the latter has generally been found to be better. MBD* is the state-of-the-art method, but at the moment only numerical forces and stresses are available in CASTEP, so the calculation is slower.
• OPT_STRATEGY: controls the optimisation strategy used. The default value is default, which is a compromise between speed and memory usage. Depending on the amount of memory required and available, it can be set to speed or memory. See here for further strategies.
• DATA_DISTRIBUTION: controls the data distribution used. Possible values are DEFAULT, KPOINT, GVECTOR, BAND, MIXED. Choosing the data distribution has implications for the efficiency and resource requirements of CASTEP. For further information, check the section on speed and memory efficiency.
• NUM_PROC_IN_SMP: Specifies the maximum number of active processors per node in an shared memory process aware calculation. See here.
• IPRINT: ­ determines the verbosity of the output in .castep. The default of 1 is usually most appropriate, but it is useful to increase this to 2 for MagRes tasks as the output will then contain information about the magnetic susceptibility tensor.
• Restarting calculations in case of exceeded runtime limits or crashed runs is possible if a .check file is available. Only one of the following options may be used.
• REUSE: will start a brand new calculation, but loads the electronic density and wave function, and use them as a starting point in the new calculation.
• CONTINUATION: will start the calculation from the end of the previous one, so the updated geometry information will also be used as well as the electronic density and wave function.
• Possible values are:
• name of an arbitrary .check file
• default, which uses the <seedname>.check file
• null, which disables the option.
• BACKUP_INTERVAL: specifies the interval, in seconds, between backups of all data for restarts, for a geometry optimisation or molecular dynamics run. If less than or equal to zero then no timed backups.
• NUM_BACKUP_ITER: specifies the number of iterations between backups of all data for restarts, for a geometry optimisation or molecular dynamics run.
• Note that the continuation mechanism does not work for MagRes jobs. Instead you can use DEVEL_CODE MAGRES_RESTART. A set of temporary files will be written periodically (respecting the BACKUP_INTERVAL keyword) and if these files are present upon continuation and DEVEL_CODE MAGRES_RESTART is set, then they will be read in. Note that because there is one file written per MPI process, the continuation run must use the same parallelisation as the original for NMR restart to work successfully.
• WRITE_CHECKPOINT: specifies whether or not to write both .check and .castep_bin files. These files can be fairly large, so they take up considerable amount of disk space, and writing them can be slow. Allowed values are NONE, MINIMAL (only .castep_bin is written) and ALL.
• WRITE_CIF_STRUCTURE: specifies whether or not to write the final structure in a .CIF file format so the structure can be viewed in Mercury. Useful in case of a geometry optimisation or molecular dynamics run.
• WRITE_CELL_STRUCTURE: specifies whether or not to write the final structure in a .CELL file format. Useful in case of a geometry optimisation or molecular dynamics run.

## Running CASTEP

Run the CASTEP executable, appending the <seedname> after the command. The exact form of the command is specific to the computer cluster that is used.

### The "dry run"

It is possible to test the input files by performing a short CASTEP calculation that stops after the initialisation:  castep <seedname> -dryrun  The resulting .castep file will highlight any syntax problems in the input files. It will also indicate the number of k-points that will be used. Note that the memory usage estimates will only be accurate if the same number of MPI processes are used in the dry run as in the production run.

CASTEP determines if a supercell is used instead of a primitive unit cell, and prints a warning, for example:

Cell is a supercell containing 4 primitive cells

### Speed and memory efficiency

CASTEP can be run in parallel using MPI and OpenMP. The most efficient distribution of work is over k-points, and therefore the number of processors should be a multiple of (or divide into) the number of k­-points.6

The amount of communication between computing nodes can be reduced by specifying NUM_PROC_IN_SMP in the .param file. A number equal to the number of cores used within a node is a good start, but for some computer architectures using a fraction of cores per node might be more efficient.

Sometimes memory requirements of the code will exceed what is physically available on the computing nodes. In this case, various strategies can be employed:

• a moderate reduction of the memory requirement can be achieved by setting OPT_STRATEGY : memory in the .param file, at the expense of computational speed.
• compute nodes can be underpopulated. In this case not all processors are used within a node, so fewer processors share the same amount of physical memory. To use the same number of processors, more nodes need to be reserved, so more resources are used. How to do this in practice is specific to the submitter.
• a different data distribution model can be enforced by setting DATA_DISTRIBUTION : GVECTOR. This option potentially reduces the memory requirement only if there is more than one k-point in the calculation. The downside is that more communication is needed between processes, therefore the whole run will be slower.

## Output files

If the CASTEP run has completed successfully, there will be an output file <seedname>.castep plus some job­ specific files containing more detailed output, for instance, NMR information (shielding tensors and quadrupole coupling information) in <seedname>.magres.

The presence of files of the type <seedname>.err.XXX indicate that errors have occurred, with XXX referring to the id of the computing node on which the error occurred. There may also be clues in the files that contain the standard output and standard error of the run. If the .err files contain messages about communication failures and references to processor nodes, then it is most likely that the cluster has had a glitch. Restart the calculation and hope for the best. Otherwise it is likely to be a problem with the CASTEP run itself.

If the initial SCF minimisation of the system energy is unstable, the density mixing needs to be "damped" to prevent "charge sloshing"­, see here. If the system is metallic, or has a very small band gap, it can lead to "occupancy sloshing", where the occupation of bands varies from iteration to iteration.

If CASTEP has been used for geometry optimisation prior to calculation of NMR parameters, then the optimised coordinates need to be transferred to a new .cell file. If WRITE_CELL_STRUCTURE : true is present in the .param file, a <seedname>-out.cell file is printed at the end of the run, which then can be renamed.

• .castep: main output file. Contains a summary of the
• parameters
• k-point sampling
• symmetries
• electronic structure calculation. Lines ending with <-- SCF contain information on the convergence of the self consistent field iteration of the electronic structure calculation. Any problems detected here, such as oscillating (rather than convergent) behaviour may be a sign of sloshing, to remedy see here
• geometry optimisation. Lines ending with <-- BFGS contain information on the convergence of the geometry optimisation.
• NMR parameters, although the .magres file is more appropriate to look at.
• .magres: contains the chemical shieldings, which need to be referenced to give ppm values. To do so, there are three methods

• comparison with reference calculation of a known sample
• plot experimental values vs. calculated values and see if a slope -1 can be fitted,
• Øδ(exp) + Øδ(calc). If shift differences are to be compared, the shieldings can directly be subtracted, which circumvents the need for a reference. This is important for the crystal to molecule calculations.

MagresView provides a graphical interface for the .magres files.

• .usp: pseudopotential file for a specific atom, generated on-the-fly, unless specified explicitly in the .cell file. Note that for the MagRes task the pseudopotential must be generated on the fly.
• .bib: BibTEX file entries generated automatically including suggestions which papers to possibly cite based on the methods and parameters used in the calculation.
• .geom: list of configurations as the geometry optimisation progresses, step-by-step. It can be looked at in JMol, which visualises each optimisation step. Useful to check certain atom groups or possible problematic parts of the structure.
• -out.cif and -out.cell: optimised structure of the compound at the end of the geometry optimisation run.
• .check: contains all data necessary to continue the optimisation if not finished. Useful in case of a long calculation limited by the time in the queuing system.

1. The cost of the calculation also depends on the cell volume. In case of compact molecules, a cubic cell might work best. For long or flat molecules, if they are oriented along the Cartesian axes, an orthorhombic cell might be more efficient, as the separation of molecules can be kept constant.

2. In many diffraction-derived structures, hydrogens are not included. In such a case they must be added manually.

3. In Mercury this implies ticking the box "packing".

4. Equivalently, the more dense they are or the less spacing between them.

5. This may be a warning that there is a technical problem with the calculation, such as incorrect input geometry.

6. Check the number of k-points beforehand by performing a dry run.