Poor energy & force metrics on paper's datasets (carbon nanotube, buckyball catcher)

Question

Poor energy & force metrics on paper's datasets (carbon nanotube, buckyball catcher)

ale99WGiais opened this issue 7 months ago · comments

Describe the bug
We tried to fit MACE potentials on some datasets mentioned in the reference paper "Evaluation of the MACE Force Field Architecture: from Medicinal Chemistry to Materials Science".

In particular we tried fitting MACE on "Double-walled nanotube" and "Buckyball catcher".

The MAE metrics obtained by us are very different from the ones stated in the paper, so we are wondering what we colud be doing wrong :(

To Reproduce

MACE was installed using the following commands

git clone https://github.com/ACEsuit/mace.git

conda create -n mace python=3.10 -y
conda activate mace
conda install micromamba -c conda-forge -c anaconda
micromamba install pytorch==2.0 torchvision torchaudio pytorch-cuda -c pytorch -c nvidia -c conda-forge -c anaconda
micromamba install numpy scipy matplotlib ase opt_einsum prettytable pandas e3nn scikit-learn=1.3.2 -c conda-forge -c anaconda
pip install mace/

To fit MACE on the nanotube we used the following scripts:

python ~/mace/mace/cli/run_train.py \
    --name="tube-256-0-r6-int1" \
    --train_file="../md22_double-walled_nanotube.xyz" \
    --valid_fraction=0.05 \
    --E0s="average" \
    --model="MACE" \
    --num_interactions=1 \
    --num_channels=256 \
    --max_L=0 \
    --correlation=3 \
    --r_max=6.0 \
    --forces_weight=1000 \
    --energy_weight=10 \
    --batch_size=2 \
    --valid_batch_size=2 \
    --max_num_epochs=650 \
    --start_swa=450 \
    --scheduler_patience=5 \
    --patience=15 \
    --eval_interval=3 \
    --ema \
    --swa \
    --swa_forces_weight=10 \
    --error_table='PerAtomMAE' \
    --default_dtype="float64"\
    --device=cuda \
    --seed=123 \
    --restart_latest \
    --save_cpu

python ~/mace/mace/cli/run_train.py \
    --name="tube-256-2-r5-int2" \
    --train_file="../md22_double-walled_nanotube.xyz" \
    --valid_fraction=0.05 \
    --E0s="average" \
    --model="MACE" \
    --num_interactions=2 \
    --num_channels=256 \
    --max_L=2 \
    --correlation=3 \
    --r_max=5.0 \
    --forces_weight=1000 \
    --energy_weight=10 \
    --batch_size=1 \
    --valid_batch_size=2 \
    --max_num_epochs=650 \
    --start_swa=450 \
    --scheduler_patience=5 \
    --patience=15 \
    --eval_interval=3 \
    --ema \
    --swa \
    --swa_forces_weight=10 \
    --error_table='PerAtomMAE' \
    --default_dtype="float64"\
    --device=cuda \
    --seed=123 \
    --restart_latest \
    --save_cpu

python ~/mace/mace/cli/run_train.py \
    --name="tube-256-2-r3-int2" \
    --train_file="../md22_double-walled_nanotube.xyz" \
    --valid_fraction=0.05 \
    --E0s="average" \
    --model="MACE" \
    --num_interactions=2 \
    --num_channels=256 \
    --max_L=2 \
    --correlation=3 \
    --r_max=3.0 \
    --forces_weight=1000 \
    --energy_weight=10 \
    --batch_size=2 \
    --valid_batch_size=2 \
    --max_num_epochs=650 \
    --start_swa=450 \
    --scheduler_patience=5 \
    --patience=15 \
    --eval_interval=3 \
    --ema \
    --swa \
    --swa_forces_weight=10 \
    --error_table='PerAtomMAE' \
    --default_dtype="float64"\
    --device=cuda \
    --seed=123 \
    --restart_latest \
    --save_cpu

According with the examples in https://mace-docs.readthedocs.io/en/latest/examples/training_examples.html

Similar scripts were adopted for the buckyball catcher.

The code was submitted to single Nvida Tesla A100 GPU machines with a time limit of about 3 days.

Data for both nanotube and buckyball was downloaded from here: http://www.sgdml.org/

Expected behavior

We expected to have low energy and force MAE as in the paper:

But we got errors orders of magnitude higher:

buckyball mace-256-0-r6-int1 stdout.txt
buckyball mace-256-2-r3-int2 stdout.txt
buckyball mace-256-2-r5-int2 stdout.txt
nanotube mace-256-0-r6-int1 stdout.txt
nanotube mace-256-2-r3-int2 stdout.txt
nanotube mace-256-2-r5-int2 stdout.txt

Everything is uploaded here: https://uniudamce-my.sharepoint.com/:f:/g/personal/142135_spes_uniud_it/EvqCwMiR9PNMkZqb8L5iQTMBnHnpEm0-CQVCOsEskxbdaA?e=dhfnXJ

Thanks very much for the support,
Alessio

Ilyes Batatia · Answer 1 · Tue Jan 23 2024 02:35:31 GMT+0800 (China Standard Time)

The MACE numbers are in eV and eV/A, but the original dataset is in kcal/mol. Did you make the conversion? For numerical precision, it is better to use eV and eV/A in the MACE code.

ale99WGiais · Answer 2 · Tue Jan 23 2024 05:16:16 GMT+0800 (China Standard Time)

Hi Ilyes, thanks for your very quick response!

No, I'm sorry but we didn't notice that the original dataset was in in kcal/mol.

We'll try converting the dataset to eV and refit the potentials asap :)