Tutorial 2 - Generators, Properties and Calculators

Tutorial 2 - Generators, Properties and Calculators#

      _
    /|_|\   
   / / \ \  
  /_/   \_\  
  \ \   / /  
   \ \_/ /  
    \|_|/  

SOPRANO: a Python library for generation, manipulation and analysis of large batches of crystalline structures

Developed within the CCP-NC project. Copyright STFC 2022

# Basic imports
import os, sys
sys.path.insert(0, os.path.abspath('..')) # This to add the Soprano path to the PYTHONPATH
                                          # so we can load it without installing it
# Other useful imports

import numpy as np

from ase import Atoms
from ase import io as ase_io
from ase.visualize.plot import plot_atoms

1 - USING GENERATORS#

Soprano provides a series of generators able to create multiple structures on one go based on simple criteria. One of these, used here, is the linspaceGen, which interpolates linearly between two extreme structures. Others are the rattleGen (generating copies of a given structure with random atomic displacements) and the airssGen (binding to AIRSS’ buildcell executable to generate random structures, only available if AIRSS is installed).

from soprano.collection import AtomsCollection
from soprano.collection.generate import linspaceGen


# Let's use the ammonia molecule switching configurations as an example
nh3coords = np.array([[ 2.5,     2.5,     2.5   ],
                      [ 3.4373,  2.5,     2.1193],
                      [ 2.0314,  3.3117,  2.1193],
                      [ 2.0314,  1.6883,  2.1193]])
nh3l = Atoms('NHHH', nh3coords, cell=[5,5,5]) # The cell is just an empty box
# Now the right version
nh3coords *= [1, 1, -1]
nh3r = Atoms('NHHH', nh3coords, cell=[5,5,5])

# Now let's build a collection of 21 intermediate steps between the two structures
# Note that 21 includes the two endpoints
nh3linsp = linspaceGen(nh3l, nh3r, steps=21, periodic=True)
# Generators can be passed directly to the AtomsCollection constructor
nh3coll = AtomsCollection(nh3linsp)

We can now plot the collection of structures generated by the linspaceGen. The example below shows how to do this.

# Make a 1x5 grid of subplots
from matplotlib import pyplot as plt
N = len(nh3coll)
N_plots = 5
fig, axs = plt.subplots(1, N_plots, figsize=(20, 4))
# Plot each structure in the grid
for i, ax in enumerate(axs):
    structure_index = int(i * (N +N_plots-1)/ N_plots)
    ax.set_title(f'Structure {structure_index}')
    # Use the plot_atoms function to plot the structure
    plot_atoms(nh3coll.structures[structure_index], ax=ax, rotation='0x,90y,0z', show_unit_cell=2,radii=0.65)
../_images/050ce2f7405022570baa964ed8ad19922551718bc8bc7f372c6d438aecd84594.png

2 - PROPERTIES#

Soprano Properties are classes meant to extract complex arrays of information from collections. A number of these are provided by default, but advanced users can easily create their own class inheriting from the generic AtomsProperty class to implement particular needs.

from soprano.properties.linkage import LinkageList


# As a first experiment we try using LinkageList, a property meant to return a list of all pair interatomic distances
# in a system. This can serve as a fingerprint to distinguish different structures

# The basic usage is to just call the Property's method "get". In this way the Property is calculated with
# default parameters.
# The three shortest values (varying) are N-H distances, while the constant ones are H-H distances

print("---- Linkage List for all NH3 configurations - Default parameters\n")
print('\n'.join(['{0}'.format(x) for x in LinkageList.get(nh3coll)]), "\n\n")
---- Linkage List for all NH3 configurations - Default parameters

[1.01162016 1.01162016 1.01166387 1.62339512 1.62339512 1.6234    ]
[0.99791691 0.99791691 0.99796123 1.62339512 1.62339512 1.6234    ]
[0.98549462 0.98549462 0.98553949 1.62339512 1.62339512 1.6234    ]
[0.97440226 0.97440226 0.97444764 1.62339512 1.62339512 1.6234    ]
[0.96468572 0.96468572 0.96473156 1.62339512 1.62339512 1.6234    ]
[0.95638694 0.95638694 0.95643317 1.62339512 1.62339512 1.6234    ]
[0.94954307 0.94954307 0.94958964 1.62339512 1.62339512 1.6234    ]
[0.94418577 0.94418577 0.94423261 1.62339512 1.62339512 1.6234    ]
[0.94034044 0.94034044 0.94038747 1.62339512 1.62339512 1.6234    ]
[0.93802568 0.93802568 0.93807282 1.62339512 1.62339512 1.6234    ]
[0.93725282 0.93725282 0.9373     1.62339512 1.62339512 1.6234    ]
[0.93802568 0.93802568 0.93807282 1.62339512 1.62339512 1.6234    ]
[0.94034044 0.94034044 0.94038747 1.62339512 1.62339512 1.6234    ]
[0.94418577 0.94418577 0.94423261 1.62339512 1.62339512 1.6234    ]
[0.94954307 0.94954307 0.94958964 1.62339512 1.62339512 1.6234    ]
[0.95638694 0.95638694 0.95643317 1.62339512 1.62339512 1.6234    ]
[0.96468572 0.96468572 0.96473156 1.62339512 1.62339512 1.6234    ]
[0.97440226 0.97440226 0.97444764 1.62339512 1.62339512 1.6234    ]
[0.98549462 0.98549462 0.98553949 1.62339512 1.62339512 1.6234    ]
[0.99791691 0.99791691 0.99796123 1.62339512 1.62339512 1.6234    ]
[1.01162016 1.01162016 1.01166387 1.62339512 1.62339512 1.6234    ] 
# If one wants to use parameters, an instance of the Property has to be created.
# For example LinkageList accepts a parameter "size" that limits the number of distances computed.
# This can then just be called on the AtomsCollection

customLL = LinkageList(size=3)

print("---- Linkage List for all NH3 configurations - Custom parameters\n")
print('\n'.join(['{0}'.format(x) for x in customLL(nh3coll)]), "\n\n")
---- Linkage List for all NH3 configurations - Custom parameters

[1.01162016 1.01162016 1.01166387]
[0.99791691 0.99791691 0.99796123]
[0.98549462 0.98549462 0.98553949]
[0.97440226 0.97440226 0.97444764]
[0.96468572 0.96468572 0.96473156]
[0.95638694 0.95638694 0.95643317]
[0.94954307 0.94954307 0.94958964]
[0.94418577 0.94418577 0.94423261]
[0.94034044 0.94034044 0.94038747]
[0.93802568 0.93802568 0.93807282]
[0.93725282 0.93725282 0.9373    ]
[0.93802568 0.93802568 0.93807282]
[0.94034044 0.94034044 0.94038747]
[0.94418577 0.94418577 0.94423261]
[0.94954307 0.94954307 0.94958964]
[0.95638694 0.95638694 0.95643317]
[0.96468572 0.96468572 0.96473156]
[0.97440226 0.97440226 0.97444764]
[0.98549462 0.98549462 0.98553949]
[0.99791691 0.99791691 0.99796123]
[1.01162016 1.01162016 1.01166387] 
# Now we can try creating a custom property. This one will calculate the center of mass of all Hydrogen atoms.
from soprano.properties import AtomsProperty


class HydrogenCOM(AtomsProperty):
    
    default_name = 'hydrogen_com' # These need to be defined for any property
    default_params = {}
    
    @staticmethod
    def extract(s): # This is where the core of the calculation happens
        # s is a single Atoms object passed to this method
        
        chemsyms = s.get_chemical_symbols()
        h_inds = [i for i, sym in enumerate(chemsyms) if sym == 'H']
        h_pos = s.get_positions()[h_inds]
        com = np.average(h_pos, axis=0)
        
        return com

print("---- Hydrogen COM for all NH3 configurations\n")
print('\n'.join(['{0}'.format(x) for x in HydrogenCOM.get(nh3coll)]), "\n\n")
---- Hydrogen COM for all NH3 configurations

[2.50003333 2.5        2.1193    ]
[2.50003333 2.5        2.15737   ]
[2.50003333 2.5        2.19544   ]
[2.50003333 2.5        2.23351   ]
[2.50003333 2.5        2.27158   ]
[2.50003333 2.5        2.30965   ]
[2.50003333 2.5        2.34772   ]
[2.50003333 2.5        2.38579   ]
[2.50003333 2.5        2.42386   ]
[2.50003333 2.5        2.46193   ]
[2.50003333 2.5        2.5       ]
[2.50003333 2.5        2.53807   ]
[2.50003333 2.5        2.57614   ]
[2.50003333 2.5        2.61421   ]
[2.50003333 2.5        2.65228   ]
[2.50003333 2.5        2.69035   ]
[2.50003333 2.5        2.72842   ]
[2.50003333 2.5        2.76649   ]
[2.50003333 2.5        2.80456   ]
[2.50003333 2.5        2.84263   ]
[2.50003333 2.5        2.8807    ] 

Average properties#

If you want to compute the mean of a property over a collection of structures or over sites within a structure, you can use the .mean method. For example:

AtomsProperty().mean(collection, axis=0) will compute the mean of the property over all structures in the collection, while AtomsProperty().mean(collection, axis=1) will compute the mean of the property over all sites for each structure in the collection.

In the previous cell we defined a new AtomsProperty called HydrogenCOM in which we compute the center of mass of the hydrogen atoms in a structure. We could have instead approached this problem using a much simpler property that gets all cartesian coordinates, and then use a custom selection to return the hydrogen positions and then compute the mean of those to get the center of mass.

# Simple class to return the positions of all atoms in the system
class AtomsPositions(AtomsProperty):
    
    default_name = 'atoms_positions'
    default_params = {}

    @staticmethod
    def extract(s):
        return s.get_positions()

# Now we can use the selection argument to select only the H atoms and choose to 
# average them along the first axis (i.e. average over all the H positions 
# within each structure in the collection)
H_COM_alternative = AtomsPositions().mean(nh3coll, selection="H", axis=1)

# This is equivalent to the previous property, but it shows how to use the selection
# and mean methods to get the same result
print("---- Hydrogen COM for all NH3 configurations\n")
print('\n'.join(['{0}'.format(x) for x in H_COM_alternative]), "\n\n")
---- Hydrogen COM for all NH3 configurations

[2.50003333 2.5        2.1193    ]
[2.50003333 2.5        2.15737   ]
[2.50003333 2.5        2.19544   ]
[2.50003333 2.5        2.23351   ]
[2.50003333 2.5        2.27158   ]
[2.50003333 2.5        2.30965   ]
[2.50003333 2.5        2.34772   ]
[2.50003333 2.5        2.38579   ]
[2.50003333 2.5        2.42386   ]
[2.50003333 2.5        2.46193   ]
[2.50003333 2.5        2.5       ]
[2.50003333 2.5        2.53807   ]
[2.50003333 2.5        2.57614   ]
[2.50003333 2.5        2.61421   ]
[2.50003333 2.5        2.65228   ]
[2.50003333 2.5        2.69035   ]
[2.50003333 2.5        2.72842   ]
[2.50003333 2.5        2.76649   ]
[2.50003333 2.5        2.80456   ]
[2.50003333 2.5        2.84263   ]
[2.50003333 2.5        2.8807    ] 

3 - CALCULATORS#

The Atomic Simulation Environment provides bindings to many codes in the form of calculators. These include ab initio codes like CASTEP and VASP as well as empirical force fields. These calculators can be set and used in Soprano as well. Here we’re going to use the most basic one, the Lennard-Jones force field, as an example.

from ase.calculators.lj import LennardJones
from soprano.properties.basic import CalcEnergy

nh3coll.set_calculators(LennardJones) # Creates calculators of the given type for all structures

print("---- NH3 Lennard-Jones energy for all configurations ----\n")
print('\n'.join(['{0}'.format(x) for x in CalcEnergy.get(nh3coll)]), "\n\n")
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[9], line 4
      1 from ase.calculators.lj import LennardJones
      2 from soprano.properties.basic import CalcEnergy
----> 4 nh3coll.set_calculators(LennardJones) # Creates calculators of the given type for all structures
      6 print("---- NH3 Lennard-Jones energy for all configurations ----\n")
      7 print('\n'.join(['{0}'.format(x) for x in CalcEnergy.get(nh3coll)]), "\n\n")

File ~/work/soprano/soprano/soprano/collection/collection.py:409, in AtomsCollection.set_calculators(self, calctype, labels, params)
    407 from ase.calculators.calculator import Calculator as cCalculator
    408 from ase.calculators.calculator import FileIOCalculator as ioCalculator
--> 409 from ase.calculators.general import Calculator as gCalculator
    411 if (
    412     (gCalculator not in calctype.__bases__)
    413     and (cCalculator not in calctype.__bases__)
    414     and (ioCalculator not in calctype.__bases__)
    415 ):
    416     raise TypeError("calctype must be a type of ASE Calculator")

ModuleNotFoundError: No module named 'ase.calculators.general'
# Now let's try a plot
%matplotlib inline
import matplotlib.pyplot as plt

comz = np.array(HydrogenCOM.get(nh3coll))[:,2]-2.5
ljE = CalcEnergy.get(nh3coll)

plt.plot(comz, ljE)
[<matplotlib.lines.Line2D at 0x7c93a26eedb0>]
../_images/2ccf74971561f3fe241263e3f74b10394bb15f3287e0a87a5ea0fed65206734c.png