facebookresearch / flowmm

Code for “FlowMM Generating Materials with Riemannian Flow Matching”.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FlowMM: Generating Materials with Riemannian Flow Matching

arxiv twitter

Crystal Structure Prediction

GIF of a trajectory from noise to crystal structure.

De Novo Generation

Conceptual diagram of flow matching along our manifold for De Novo Generation.

Installation

Follow the steps to get the files:

# clone this repo into the folder flowmm/
cd flowmm
git submodule init
git submodule update
bash create_env_file.sh  # creates the necessary .env file

The submodules include CDVAE, DiffCSP, and Riemannian Flow Matching.

Now we can install flowmm. We recommend using micromamba because conda is extremely slow. You can install micromamba by following their guide. If the installation fails, try again a few more times.

micromamba env create -f environment.yml

Activate using

micromamba activate flowmm

Data

The training data is in .csv format in data/. When you first train on it, the script will convert the data into a faster-to-load format.

If you want to compute energy above hull, you must download the convex hull from 2023-02-07. Extract the files to the folder mp_02072023/. We got this hull from Matbench Discovery.

Experiments

The various datasets you can train on are as follows:

data \in {perov, carbon, mp_20, mpts_52}

There are a few preselected model options in scripts_model/conf/model. The format is {atom_type_manifold}_{lattice_manifold}.

Atom type manifolds:

  • abits - analog bits proposed for use in FlowMM
  • null - used for conditional generation
  • simplex - the method used to fit the atom types in DiffCSP

Lattice manifolds:

  • nonsym - the method to fit the lattice in DiffCSP
  • params - our proposed method in FlowMM, lattice parameters
  • params_normal_base - ablated version where the lattice parameters base distribution is gaussian and not sent to constrained space.

Conditional Training

python scripts_model/run.py data=perov model=null_params

Unconditional Training

python scripts_model/run.py data=perov model=abits_params

Evaluation

Discussion about evaluation is limited to FlowMM. scripts_model/evaluate.py uses click, allowing it to serve as a multi-purpose evaluation program.

Conditional Evaluation - Crystal Structure Prediction - Reconstruction

These commands will:

  1. reconstruct the test set
  2. consolidate the results into a single torch pickle with the correct format
  3. compute the match rate and root mean square error, compared to the test set
  4. create plots of the distribution of lattice parameters, compared to the test set

The user must provide PATH_TO_CHECKPOINT, NAME_OF_SUBDIRECTORY_AT_CHECKPOINT, SLOPE_OF_INFERENCE_ANTI_ANNEALING.

ckpt=PATH_TO_CHECKPOINT
subdir=NAME_OF_SUBDIRECTORY_AT_CHECKPOINT
slope=SLOPE_OF_INFERENCE_ANTI_ANNEALING
python scripts_model/evaluate.py reconstruct ${ckpt} --subdir ${subdir} --inference_anneal_slope ${slope} --stage test && \
python scripts_model/evaluate.py consolidate ${ckpt} --subdir ${subdir} && \
python scripts_model/evaluate.py old_eval_metrics ${ckpt} --subdir ${subdir} --stage test && \
python scripts_model/evaluate.py lattice_metrics ${ckpt} --subdir ${subdir} --stage test

Unconditional Evaluation - De Novo Generation

These commands will:

  1. generate 10k structures from a checkpoint
  2. consolidate the results into a single torch pickle with the correct format
  3. compute the De Novo Generation proxy metrics, compared to the test set
  4. create plots of the distribution of lattice parameters, compared to the test set

The user must provide PATH_TO_CHECKPOINT, NAME_OF_SUBDIRECTORY_AT_CHECKPOINT, SLOPE_OF_INFERENCE_ANTI_ANNEALING.

ckpt=PATH_TO_CHECKPOINT
subdir=NAME_OF_SUBDIRECTORY_AT_CHECKPOINT
slope=SLOPE_OF_INFERENCE_ANTI_ANNEALING
python scripts_model/evaluate.py generate ${ckpt} --subdir ${subdir} --inference_anneal_slope ${slope} && \
python scripts_model/evaluate.py consolidate ${ckpt} --subdir ${subdir} && \
python scripts_model/evaluate.py old_eval_metrics ${ckpt} --subdir ${subdir} --stage test && \
python scripts_model/evaluate.py lattice_metrics ${ckpt} --subdir ${subdir} --stage test

Prerelaxation

Taking the generations from the previous step, we can prerelax them using CHGNet on the cpu. This script works locally, but it is designed to parallelize the process over nodes on a slurm cluster.

If you want to use slurm, the user must then provide YOUR_SLURM_PARTITION.

# get the path to the structures 
eval_for_dft_pt=$(python scripts_model/evaluate.py consolidate "${ckpt}" --subdir "${subdir}" --path_eval_pt eval_for_dft.pt | tail -n 1)

# get the eval_for_dft_json
parent=${eval_for_dft_pt%/*}  # retain part before the last slash
eval_for_dft_json="${eval_for_dft_pt%.*}.json"  # retain part before the period, add .json
log_dir="${parent}/chgnet_log_dir"

# set other flags, if you are using slurm.
num_jobs=1
slurm_partition=YOUR_SLURM_PARTITION

# prerelax
python scripts_analysis/prerelax.py "$eval_for_dft_pt" "$eval_for_dft_json" "$log_dir" --num_jobs "$num_jobs" --slurm_partition "$slurm_partition"

Preparing Density Functional Theory

We can continue by doing density functional theory (DFT) with VASP.

The user must provide PATH_TO_YOUR_PSEUDOPOTENTIALS, which requires a VASP license.

export PMG_VASP_PSP_DIR=PATH_TO_YOUR_PSEUDOPOTENTIALS 

# create the folder to hold the dft files
dft_folder="${parent}/dft"
mkdir -p "$dft_folder"

# create the dft inputs
python scripts_analysis/dft_create_inputs.py "${eval_for_dft_json}" "${dft_folder}"

We do not provide guidance on running DFT.

We note that your DFT results should typically be corrected using the settings from the Materials Project.

Computing Energy Above Hull

We can compute the energy above hull using the (corrected) DFT relaxed energies or the CHGNet prerelaxed energies.

Note: This whole section requires downloading the convex hull from above!

If you want to use the CHGNet prerelaxed energies you can use the following commands. Since the prerelaxed energies are generally inaccurate, they go in their own column in the .json file. The prerelaxed energy above hull from CHGNet will be computed whether or not DFT relaxations were run.

json_e_above_hull="${parent}/ehulls.json"
python scripts_analysis/ehull.py "${eval_for_dft_json}" "${json_e_above_hull}"

If you want to use your DFT calculations you can use the following code. Our script expects that clean_outputs contains trajectory files with name ######.traj where ###### corresponds to the index of the corresponding row in the ${eval_for_dft_json} file.

clean_outputs_dir="${parent}/clean_outputs"
json_e_above_hull="${parent}/ehulls.json"
python scripts_analysis/ehull.py "${eval_for_dft_json}" "${json_e_above_hull}" --clean_outputs_dir "${clean_outputs_dir}"
Computing the Corrected Energy Above Hull

We have a script which does the corrections for you, but it must correspond to our format. There must be a root directory that has (1) a subdirectory called dft, which was created using scripts_analysis/dft_create_inputs.py and (2) a subdirectory called clean_outputs, which contains trajectory files with name ######.traj where ###### corresponds to the index of the corresponding row in the ${eval_for_dft_json} file.

root_dft_clean_outputs="${parent}"
ehulls_corrected_json="${parent}/ehulls_corrected.json"
python scripts_analysis/ehull_correction.py "${eval_for_dft_json}" "${ehulls_corrected_json}" --root_dft_clean_outputs "${root_dft_clean_outputs}"

Computing S.U.N.

The most accurate estimates of S.U.N. structures occur when using corrected DFT relaxed energies.

sun_json=sun.json
python scripts_analysis/novelty.py "${eval_for_dft_json}" "${sun_json}" --ehulls "${ehulls_corrected_json}"

Citation

If you find this repository helpful for your publications, please consider citing our paper:

@inproceedings{
    miller2024flowmm,
    title={Flow{MM}: Generating Materials with Riemannian Flow Matching},
    author={Benjamin Kurt Miller and Ricky T. Q. Chen and Anuroop Sriram and Brandon M Wood},
    booktitle={Forty-first International Conference on Machine Learning},
    year={2024},
    url={https://openreview.net/forum?id=W4pB7VbzZI}
}

License

flowmm is CC-BY-NC licensed, as found in the LICENSE.md file. However, the git submodules may have different license terms:

The licenses for the dependencies can be viewed at the corresponding project's homepage.

About

Code for “FlowMM Generating Materials with Riemannian Flow Matching”.

License:Other


Languages

Language:Python 99.9%Language:Shell 0.1%