soprano.collection.collection#

Definition of the Collection class.

It handles multiple Atoms ASE objects and mirrors in this sense the structure of the Atoms object itself.

Classes

AtomsCollection([structures, info, ...])

AtomsCollection object.

class soprano.collection.collection.AtomsCollection(structures=[], info={}, cell_reduce=False, progress=False, suppress_ase_warnings=True)[source]#

Bases: object

AtomsCollection object.

An AtomsCollection represents a group of ASE Atoms objects. It handles them together, can perform mass operations on them, and stores arrays of informations related to them.

Initialize the AtomsCollection

Args:
structures (list[str] or list[ase.Atoms]): list of file names or
Atoms that will form
the collection
info (dict): dictionary of general information to attach
to this collection
cell_reduce (bool): if True, perform a Niggli cell reduction on
all loaded structures
progress (bool): visualize a progress bar for the loading process
suppress_ase_warnings (bool): suppress annoying ASE warnings when
loading files (default is True)
__add__(other)[source]#

Addition of two collections brings a merging

__deepcopy__(memodict={})[source]#

Protects against problems with infinite recursion in AllCaller

__getitem__(indices)[source]#

Allow sophisticated slicing

static check_tree(path)[source]#

Checks if a path is a valid ‘tree’ format for a collection. This is any folder that satisfies the following conditions:

  • contains a .collection file storing metadata

  • contains a series of folders matching the list stored in the .collection file, and nothing else

This function will return 0 if both conditions are satisfied, 1 if only the first is, 2 if no .collection file is found, and -1 if the folder itself doesn’t exist.

Args:
path (str): path to check for whether it matches or not the
collection pattern
Returns:
result (int): 0, 1 or 2 depending on the outcome of the checks
chunkify(chunk_size=None, chunk_n=None)[source]#

Split this collection into multiple collections based on either size or number of chunks.

Args:
chunk_size (Optional[int]): maximum size of a generated chunk
chunk_n (Optional[int]): number of chunks to generate
Returns:
chunks (list[AtomsCollection]): a list of the generated chunks
classify(classes)[source]#

Return a dictionary of collections based on the names of assigned classes.

Args:
classes (np.ndarray): array of the class to which each structure
belongs. For example [1, 2, 1] will put the
first and third structures in class 1 and
the other in class 2. The classes can be any
hashable types, like int or str.
Returns:
classified (dict): a dictionary using class names as keys and
sliced collections as values
filter(filter_func)[source]#

Return a collection composed only of the elements for which a given filter function returns True.

Args:
filter_func (function<Atoms>
=> bool): filter function. Should take an
Atoms object and return a boolean
Returns:
filtered (AtomsCollection): the filtered version of the collection
get_array(name, copy=True)[source]#

Get a copy of an array of given name (or a reference if copy=False)

Args:
name (str): name of the array to retrieve.
copy (bool): if the array should be copied or a reference should
be returned instead.
Returns:
array (np.ndarray): the requested array
has(name)[source]#

Check if array of given name exists

static load(filename)[source]#

Load a pickled copy from a given file path

static load_tree(path, load_format, opt_args={}, safety_check=3, tolerant=False, suppress_ase_warnings=True)[source]#

Load a collection’s structures from a series of folders, named like the structures, inside a given parent folder, as created by save_tree. The files can be loaded from a format of choice, or a function can be passed that will load them in a custom way.

Args:
path (str): folder path in which the collection should be saved.
load_format (str or function): format from which the structures
should be loaded.
If a string, it will be used as a
file extension. If a function, it
must take as arguments the load
path (a string) and any additional
arguments passed as opt_args, and
return the loaded structure as an
ase.Atoms object.
opt_args(dict): dictionary of additional arguments to pass to
either ase.io.read (if load_format is a string)
or to the load_format function.
safety_check (int): how much care should be taken to verify the
folder that is being loaded. Can be a number
from 0 to 3.
Here’s the meaning of the codes:

3 (default): only load a folder if it passes
fully the check_tree control;
2: load any folder that has a valid
.collection file, but only the listed
subfolders;
1: load any folder that has a valid
.collection file, all subfolders. Array
data will be discarded;
0: no checks, try to load from all subfolders.
tolerant (bool): if set to true, proceeds to load the
structures into an AtomsCollection, even
if some of the structures could not be
read.
Returns:
coll (AtomsCollection): loaded collection
run_calculators(properties=None, system_changes=None)[source]#

Run all previously set ASE calculators.

Args:
properties (list[str]): list of properties to calculate (depends
on type of Calculator used)
system_changes (list[str]): list of changes to the structure
since the last calculation. Can be
any combination of these five:
‘positions’, ‘numbers’, ‘cell’,
‘pbc’, ‘initial_charges’ and
‘initial_magmoms’.
save(filename)[source]#

Simply save a pickled copy to a given file path

save_tree(path, save_format, name_root='structure', opt_args={}, safety_check=3, suppress_ase_warnings=True)[source]#

Save the collection’s structures as a series of folders, named like the structures, inside a given parent folder (that will be created if not present). Arrays and info are stored in a pickled .collection file which works as metadata for the whole directory tree. The files can be saved in a format of choice, or a function can be passed that will save them in a custom way. Only one collection can be saved per folder.

Args:
path (str): folder path in which the collection should be saved.
save_format (str or function): format in which the structures
should be saved.
If a string, it will be used as a
file extension. If a function, it
must take as arguments the
structure (an ase.Atoms object)
the save path (a string), and any
additional arguments passed as
opt_args, and take care of saving
the required files.
name_root (str): name prefix to be used for structures when a name
is not available in their info dictionary
opt_args (dict): dictionary of additional arguments to pass to
either ase.io.write (if save_format is a string)
or to the save_format function.
safety_check (int): how much care should be taken not to overwrite
potentially important data in path. Can be a
number from 0 to 3.
Here’s the meaning of the codes:

3 (default): always ask before overwriting an
existing folder that passes the check_tree
control, raise an exception otherwise;
2: overwite any folder that passes fully the
check_tree control, raise an exception
otherwise;
1: overwrite any folder that passes fully the
check_tree control, ask for user input
otherwise;
0 (DANGER - use at your own risk!): no checks,
always overwrite path.
set_array(name, a, dtype=None, shape=None, args={})[source]#

Add or modify an array of data related to the Atoms objects in this collection.

Args:
name (str): name of the array to operate on.
a (np.ndarray or function<Atoms, **kwargs>
=> Any): the data to assign to the array (must
be same length as the collection) or
a function that takes an Atoms object
as the first argument and returns a
value. This will be mapped over the
structures to create the array.
dtype (type): type to cast the values of the array to.
shape (tuple [int]): shape of each entry of the array. Will be
checked if provided.
args (dict): named arguments to pass to the function provided
as a. Will be ignored if an array is passed instead.
set_calculators(calctype, labels=None, params={})[source]#

Set an ASE calculator on each structure in the collection, and set said calculator’s parameters.

Args:
calctype (ASE Calculator type): the type of calculator
to instantiate.
labels (Optional[list[str]]): names to use for the calculators’
files. If not present, random
generated names are used.
params (Optional[dict]): parameters of the calculator to set.
sorted_byarray(name, reverse=False)[source]#

Return a copy of this collection sorted by a given array.

Args:
name (str): name of the array to use for the sorting
reverse (Optional[bool]): reverse order of sorting (max to min)
Returns:
sorted (AtomsCollection): a sorted copy of the collection
class soprano.collection.collection._AllCaller(all_list, all_class=None)[source]#

Bases: object

_AllCaller class.

A meta-object that serves the purpose of calling a function on all members of a list in a natural way.

Initialize the AllCaller with an ‘all’ list

__getattr__(name)[source]#

Here’s the magic of the class - when a method isn’t found belonging to it, go looking for it in its ._all list…

map(f, *args, **kwargs)[source]#

Map a function to each element of the ._all list and return the results.