24mlight / SparseDiff

Official implementation for 'Sparse denoising diffusion for large graph generation'

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sparse denoising diffusion for large graph generation

Official code of Sparse Training of Discrete Diffusion Models for Graph Generation.

Note (November 7th): the code is currently incomplete as we running experiments again with a slightly improved architecture. Checkpoints will be added progressively when they are available.

Environment installation

This code was tested with PyTorch 2.0.1, cuda 11.8 and torch_geometrics 2.3.1

  • Download anaconda/miniconda if needed

  • Create a rdkit environment that directly contains rdkit:

    conda create -c conda-forge -n sparse rdkit=2023.03.2 python=3.9

  • conda activate sparse

  • Check that this line does not return an error:

    python3 -c 'from rdkit import Chem'

  • Install graph-tool (https://graph-tool.skewed.de/):

    conda install -c conda-forge graph-tool=2.45

  • Check that this line does not return an error:

    python3 -c 'import graph_tool as gt'

  • Install the nvcc drivers for your cuda version. For example:

    conda install -c "nvidia/label/cuda-11.8.0" cuda

  • Install a corresponding version of pytorch, for example:

    pip3 install torch==2.0.1 --index-url https://download.pytorch.org/whl/cu118

  • Install other packages using the requirement file:

    pip install -r requirements.txt

  • Install mini-moses:

    pip install git+https://github.com/igor-krawczuk/mini-moses

  • Run:

    pip install -e .

  • Navigate to the ./sparse_diffusion/analysis/orca directory and compile orca.cpp:

    g++ -O2 -std=c++11 -o orca orca.cpp

Run the code

  • All code is currently launched through python3 main.py. Check hydra documentation (https://hydra.cc/) for overriding default parameters.
  • To run the debugging code: python3 main.py +experiment=debug.yaml. We advise to try to run the debug mode first before launching full experiments.
  • To run a code on only a few batches: python3 main.py general.name=test.
  • You can specify the dataset with python3 main.py dataset=guacamol. Look at configs/dataset for the list of datasets that are currently available
  • You can specify the edge fraction (denoted as $\lambda$ in the paper) with python3 main.py model.edge_fraction=0.2 to control the GPU-usage

Cite the paper

@misc{qin2023sparse,
      title={Sparse Training of Discrete Diffusion Models for Graph Generation}, 
      author={Yiming Qin and Clement Vignac and Pascal Frossard},
      year={2023},
      eprint={2311.02142},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Troubleshooting

PermissionError: [Errno 13] Permission denied: 'SparseDiff/sparse_diffusion/analysis/orca/orca': You probably did not compile orca.

About

Official implementation for 'Sparse denoising diffusion for large graph generation'

License:MIT License


Languages

Language:Python 76.0%Language:C++ 24.0%