lilujunai / DiS

Scalable Diffusion Models with State Space Backbone

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Scalable Diffusion Models with State Space Backbone (DiS)
Official PyTorch Implementation

This repo contains PyTorch model definitions, pre-trained weights and training/sampling code for our paper exploring diffusion models with state space backbones (DiSs). Our model treats all inputs including the time, condition and noisy image patches as tokens and employs skip connections between shallow and deep layers. Different from original Mamba for text sequence modeling, our SSM block process the hidden states sequence with both forward and backward directions

DiS framework

Environments

  • Python 3.10

    • conda create -n your_env_name python=3.10
  • Requirements file

    • pip install -r requirements.txt
  • Install causal_conv1d and mamba

    • pip install -e causal_conv1d
    • pip install -e mamba

Training

We provide a training script for DiS in train.py. This script can be used to train unconditional, class-conditional DiS models, it can be easily modified to support other types of conditioning. To launch DiS-L/2 (256x256) training with N GPUs on one node:

torchrun --nnodes=1 --nproc_per_node=N train.py \
--model DiS-L/2 \
--dataset-type imagenet \
--data-path /path/to/imagenet/train \
--image-size 256 \
--task-type class-cond \
--num-classes 1000 

There are several additional options; see train.py for details. All experiments in our work of training script can be found in file direction script.

For convenience, the pre-trained DiS models can be downloaded directly here as well:

DiT Model Image Resolution FID-50K
[DiS-H/2] 256x256 2.10
[DiS-H/2] 512x512 2.88

Evaluation

We include a sample.py script which samples images from a DiS model. Besides, we support other metrics evaluation in test.py script.

python sample.py \
--model DiS-L/2 \
--dataset-type imagenet \
--ckpt /path/to/model \
--image-size 256 \
--num-classes 1000 \
--cfg-scale 1.5

BibTeX

@article{FeiDiS2024,
  title={Scalable Diffusion Models with State Space Backbone},
  author={Zhengcong Fei, Mingyuan Fan, Changqian Yu, Jusnshi Huang},
  year={2024},
  journal={arXiv preprint},
}

Acknowledgments

The codebase is based on the awesome DiT, U-ViT, and Vim repos.

About

Scalable Diffusion Models with State Space Backbone


Languages

Language:Python 60.8%Language:Cuda 27.8%Language:C++ 10.3%Language:C 1.2%