Reward-Directed Conditional Diffusion:
Provable Distribution Estimation and Reward Improvement

Hui Yuan , Kaixuan Huang , Chengzhuo Ni , Minshuo Chen , Mengdi Wang
Princeton University

NeurIPS 2023

This repo contains the codes for replicating the experiments in our paper.

Requirements

pip install -r requirements.txt

Usage

Randomly generate a ground-truth reward model and the reward labels for CIFAR10 dataset.

python3 fake_dataset.py

The ground-truth reward model (a ResNet18 model with the final layer replaced by a randomly initialized linear layer) is saved at reward_model.pth, and the reward labels are saved at cifar10_outputs_with_noise.npy.

Train a 3-layer ConvNet (on top of the frozen StableDiffusion v1.5 VAE embedding space) to predict the rewards

python3 train.py

The default config is lr = 0.001, num_data = 50000, num_epochs = 100 and can be modified in train.py.

Perform Reward-Directed Conditional Diffusion using

python3 inference.py --target 1 --guidance 100 --num_images 100

The following term will be added to each step of the diffusion model. $$\nabla_x \log p_t(y|x) = - \text{guidance} \cdot \nabla_x \Big[ \frac12 | \text{target}-\mu_\theta(x)|_2^2 \Big].$$

Citation

If you find this useful in your research, please consider citing our paper.

@article{yuan2024reward,
  title={Reward-directed conditional diffusion: Provable distribution estimation and reward improvement},
  author={Yuan, Hui and Huang, Kaixuan and Ni, Chengzhuo and Chen, Minshuo and Wang, Mengdi},
  journal={Advances in Neural Information Processing Systems},
  volume={36},
  year={2024}
}

Kaffaljidhmah2 / RCGDM

Reward-Directed Conditional Diffusion:
Provable Distribution Estimation and Reward Improvement

Requirements

Usage

Citation

About

Languages

Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement

Requirements

Usage

Citation

About

Languages

Reward-Directed Conditional Diffusion:
Provable Distribution Estimation and Reward Improvement