Clone the repository:
git clone https://github.com/ngocbh/pvd-peft
cd pvd-peft
Create a virtual environment containing the dependencies and activate it:
conda env create -f environment.yml -n pvd
conda activate pvd
Install the package into the environment:
pip install -e .
The model is pre-trained on the PCQM4Mv2 dataset, which contains over 3 million molecular structures at equilibrium. Run the following command to pre-train the architecture first. Note that this will download and pre-process the PCQM4Mv2 dataset when run for the first time, which can take a couple of hours depending on the machine.
python scripts/train.py --conf examples/ET-PCQM4MV2.yaml --layernorm-on-vec whitened --job-id pretraining
The option --layernorm-on-vec whitened
includes an optional equivariant whitening-based layer norm, which stabilizes denoising. The pre-trained model checkpoint will be in ./experiments/pretraining
. A pre-trained checkpoint is included in this repo at checkpoints/denoised-pcqm4mv2.ckpt
.
To parameter-efficient fine-tune (LoRA or IA3) the model for HOMO/LUMO prediction on QM9, run the following command:
python scripts/train.sh qm9_lora
python scripts/train.sh qm9_ia3
By default, the code will use all available GPUs to train the model. We used single GPU (NVIDIA RTX 2080Ti) for fine-tuning.
This implementation relies on the following source code: