jwz360 / ASAP

ASAP is a package that can quickly analyze and visualize datasets of crystal or molecular structures.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ASAP

Automatic Selection And Prediction tools for materials and molecules

documentation (in progress)

Basic usage

Type asap and use the sub-commands for various tasks.

To get help string:

asap --help .or. asap subcommand --help .or. asap subcommand subcommand --help depending which level of help you are interested in.

  • asap gen_desc: generate global or atomic descriptors based on the input ASE) xyze file.

  • asap map: make 2D plots using the specified design matrix. Currently PCA pca, sparsified kernel PCA skpca, UMAP umap, and t-SNE tsne are implemented.

  • asap cluster: perform density based clustering. Currently supports DBSCAN dbscan and Fast search of density peaks fdb.

  • asap fit: fast fit ridge regression ridge or sparsified kernel ridge regression model kernelridge based on the input design matrix and labels.

  • asap kde: quick kernel density estimation on the design matrix. Several versions of kde available.

  • asap select: select a subset of frames using sparsification algorithms.

Quick & basic example

Step 1: generate a design matrix

The first step for a machine-learning analysis or visualization is to generate a "design matrix" made from either global descriptors or atomic descriptors. To do this, we supply asap gen_desc with an input file that contains the atomic coordintes. Many formats are supported; anything can be read using ase.io is supported. You can use a wildcard to specify the list of input files that matches the pattern (e.g. POSCAR*, H*, or *.cif). However, it is most robust if you use an extended xyz file format (units in angstrom, additional info and cell size in the comment line).

As a quick example, in the folder ./tests/

to generate SOAP descriptors:

asap gen_desc --fxyz small_molecules-1000.xyz soap

for columb matrix:

asap gen_desc -f small_molecules-1000.xyz --no-periodic cm

Step 2: generate a low-dimensional map

After generating the descriptors, one can make a two-dimensional map (asap map), or regression model (asap fit), or clustering (asap cluster), or select a subset of frames (asap select), or do a clustering analysis (asap cluster), or estimate the probablity of observing each sample (asap kde).

For instance, to make a pca map:

asap map -f small_molecules-SOAP.xyz -dm '[SOAP-n4-l3-c1.9-g0.23]' -c dft_formation_energy_per_atom_in_eV pca

You can specify a list of descriptor vectors to include in the design matrix, e.g. '[SOAP-n4-l3-c1.9-g0.23, SOAP-n8-l3-c5.0-g0.3]'

one can use a wildcard to specify the name of all the descriptors to use for the design matrix, e.g.

asap map -f small_molecules-SOAP.xyz -dm '[SOAP*]' -c dft_formation_energy_per_atom_in_eV pca

or even

asap map -f small_molecules-SOAP.xyz -dm '[*]' -c dft_formation_energy_per_atom_in_eV pca

Step 2+: interactive visualization

Using asap map, a png figure is generated. In addition, the code also output the low-dimensional coordinates of the structures and/or atomic environments. The default output is extended xyz file. One can also specify a different output format using --output or -o flag. and the available options are xyz, matrix and chemiscope.

  • If one select chemiscope format, a *.json.gz file will be writen, which can be directly used as the input of chemiscope

  • If the output is in xyz format, it can be visualized interactively using projection_viewer.

Installation & requirements

python 3

Installation:

python3 setup.py install --user

This should automatically install any depedencies.

List of requirements:

  • numpy scipy scikit-learn json ase dscribe umap-learn PyYAML click

Add-Ons:

  • (for finding symmetries of crystals) spglib
  • (for annotation without overlaps) adjustText
  • The FCHL19 representation requires code from the development brach of the QML package. Instructions on how to install the QML package can be found on https://www.qmlcode.org/installation.html.

How to add your own atomic or global descriptors

  • To add a new atomic descriptor, add a new Atomic_Descriptor class in the asaplib/descriptors/atomic_descriptors.py. As long as it has a __init__() and a create() method, it should be competitable with the ASAP code. The create() method takes an ASE Atoms object as input (see: ASE)

We have a template class for this

class Atomic_Descriptor_Base:
    def __init__(self, desc_spec):
        self._is_atomic = True
        self.acronym = ""
        pass
    def is_atomic(self):
        return self._is_atomic
    def get_acronym(self):
        # we use an acronym for each descriptor, so it's easy to find it and refer to it
        return self.acronym
    def create(self, frame):
        # notice that we return the acronym here!!!
        return self.acronym, []
  • To add a new global descriptor, add a new Global_Descriptor class in the asaplib/descriptors/global_descriptors.py. As long as it has a __init__() and a create() method, it is fine. The create() method also takes the Atoms object as input.

The template is similar with the atomic one:

class Global_Descriptor_Base:
    def __init__(self, desc_spec):
        self._is_atomic = False
        self.acronym = ""
        pass
    def is_atomic(self):
        return self._is_atomic
    def get_acronym(self):
        # we use an acronym for each descriptor, so it's easy to find it and refer to it
        return self.acronym
    def create(self, frame):
        # return the dictionaries for global descriptors and atomic descriptors (if any)
        return {'acronym': self.acronym, 'descriptors': []}, {}

Additional tools

In the directory ./scripts/ and ./tools/ you can find a selection of other python tools.

Tab completion

Tab completion can be enabled by sourcing the asap_completion.sh script in the ./scripts/ directory. If a conda environment is used, you can copy this file to $CONDA_PREFIX/etc/conda/activate.d/ to automatically load the completion upon environment activation.

About

ASAP is a package that can quickly analyze and visualize datasets of crystal or molecular structures.

License:MIT License


Languages

Language:Python 99.9%Language:Shell 0.1%Language:Makefile 0.0%