Automatic Selection And Prediction tools for materials and molecules
documentation (in progress)
Type asap
and use the sub-commands for various tasks.
To get help string:
asap --help
.or. asap subcommand --help
.or. asap subcommand subcommand --help
depending which level of help you are interested in.
-
asap gen_desc
: generate global or atomic descriptors based on the input ASE) xyze file. -
asap map
: make 2D plots using the specified design matrix. Currently PCApca
, sparsified kernel PCAskpca
, UMAPumap
, and t-SNEtsne
are implemented. -
asap cluster
: perform density based clustering. Currently supports DBSCANdbscan
and Fast search of density peaksfdb
. -
asap fit
: fast fit ridge regressionridge
or sparsified kernel ridge regression modelkernelridge
based on the input design matrix and labels. -
asap kde
: quick kernel density estimation on the design matrix. Several versions of kde available. -
asap select
: select a subset of frames using sparsification algorithms.
The first step for a machine-learning analysis or visualization is to generate a "design matrix" made from either global descriptors or atomic descriptors. To do this, we supply asap gen_desc
with an input file that contains the atomic coordintes. Many formats are supported; anything can be read using ase.io is supported. You can use a wildcard to specify the list of input files that matches the pattern (e.g. POSCAR*
, H*
, or *.cif
). However, it is most robust if you use an extended xyz file format (units in angstrom, additional info and cell size in the comment line).
As a quick example, in the folder ./tests/
to generate SOAP descriptors:
asap gen_desc --fxyz small_molecules-1000.xyz soap
for columb matrix:
asap gen_desc -f small_molecules-1000.xyz --no-periodic cm
After generating the descriptors, one can make a two-dimensional map (asap map
), or regression model (asap fit
), or clustering (asap cluster
), or select a subset of frames (asap select
), or do a clustering analysis (asap cluster
), or estimate the probablity of observing each sample (asap kde
).
For instance, to make a pca map:
asap map -f small_molecules-SOAP.xyz -dm '[SOAP-n4-l3-c1.9-g0.23]' -c dft_formation_energy_per_atom_in_eV pca
You can specify a list of descriptor vectors to include in the design matrix, e.g. '[SOAP-n4-l3-c1.9-g0.23, SOAP-n8-l3-c5.0-g0.3]'
one can use a wildcard to specify the name of all the descriptors to use for the design matrix, e.g.
asap map -f small_molecules-SOAP.xyz -dm '[SOAP*]' -c dft_formation_energy_per_atom_in_eV pca
or even
asap map -f small_molecules-SOAP.xyz -dm '[*]' -c dft_formation_energy_per_atom_in_eV pca
Using asap map
, a png figure is generated. In addition, the code also output the low-dimensional coordinates of the structures and/or atomic environments. The default output is extended xyz file. One can also specify a different output format using --output
or -o
flag. and the available options are xyz
, matrix
and chemiscope
.
-
If one select
chemiscope
format, a*.json.gz
file will be writen, which can be directly used as the input of chemiscope -
If the output is in
xyz
format, it can be visualized interactively using projection_viewer.
python 3
Installation:
python3 setup.py install --user
This should automatically install any depedencies.
List of requirements:
- numpy scipy scikit-learn json ase dscribe umap-learn PyYAML click
Add-Ons:
- (for finding symmetries of crystals) spglib
- (for annotation without overlaps) adjustText
- The FCHL19 representation requires code from the development brach of the QML package. Instructions on how to install the QML package can be found on https://www.qmlcode.org/installation.html.
- To add a new atomic descriptor, add a new
Atomic_Descriptor
class in the asaplib/descriptors/atomic_descriptors.py. As long as it has a__init__()
and acreate()
method, it should be competitable with the ASAP code. Thecreate()
method takes an ASE Atoms object as input (see: ASE)
We have a template class for this
class Atomic_Descriptor_Base:
def __init__(self, desc_spec):
self._is_atomic = True
self.acronym = ""
pass
def is_atomic(self):
return self._is_atomic
def get_acronym(self):
# we use an acronym for each descriptor, so it's easy to find it and refer to it
return self.acronym
def create(self, frame):
# notice that we return the acronym here!!!
return self.acronym, []
- To add a new global descriptor, add a new
Global_Descriptor
class in the asaplib/descriptors/global_descriptors.py. As long as it has a__init__()
and acreate()
method, it is fine. Thecreate()
method also takes the Atoms object as input.
The template is similar with the atomic one:
class Global_Descriptor_Base:
def __init__(self, desc_spec):
self._is_atomic = False
self.acronym = ""
pass
def is_atomic(self):
return self._is_atomic
def get_acronym(self):
# we use an acronym for each descriptor, so it's easy to find it and refer to it
return self.acronym
def create(self, frame):
# return the dictionaries for global descriptors and atomic descriptors (if any)
return {'acronym': self.acronym, 'descriptors': []}, {}
In the directory ./scripts/ and ./tools/ you can find a selection of other python tools.
Tab completion can be enabled by sourcing the asap_completion.sh
script in the ./scripts/ directory.
If a conda environment is used, you can copy this file to $CONDA_PREFIX/etc/conda/activate.d/
to automatically load the completion upon environment activation.