ControlVAR: Exploring Controllable Visual Autoregressive Modeling
Xiang Li, Kai Qiu, Hao Chen, Jason Kuen, Zhe Lin, Rita Singh, Bhiksha Raj
- (2024-08-23) We released pretrained checkpoints.
- (2024-07-28) We begin to upload the dataset (~400G) to hugging-face π€.
- (2024-07-26) We released the code for Intel HPU training (GPU version is compatible).
- (2024-07-25) Repo created. The code and datasets will be released in two weeks.
Get pre-trained VQVAE from VAR.
mkdir pretrained
cd pretrained
wget https://huggingface.co/FoundationVision/var/resolve/main/vae_ch160v4096z32.pth
Install required packages.
pip install requirements.txt
The pseudo-labeled ImageNet dataset (mask, canny, depth, and normal) is available at hugging-face π€. Please download the original ImageNet2012 dataset from official website and arrange the files in the following format.
ImageNet2012
βββ train
βββ val
βββ train_canny
βββ train_mask
βββ train_normal
βββ train_depth
βββ val_canny
βββ val_mask
βββ val_normal
βββ val_depth
ID | Depth | Joint |
---|---|---|
1 | 12 | d12.pth |
2 | 16 | d16.pth |
3 | 20 | d20.pth |
4 | 24 | d24.pth |
5 | 30 | d30.pth |
python3 train_control_var_hpu.py --batch_size $bs --dataset_name imagenetC --data_dir $path_to_ImageNetC --gpus $gpus --output_dir $output_dir --multi_cond True --config configs/train_mask_var_ImageNetC_d12.yaml --var_pretrained_path pretrained/var_d12.pth
python3 train_control_var_hpu.py --batch_size $bs --dataset_name imagenetC --data_dir $path_to_ImageNetC --gpus $gpus --output_dir $output_dir --multi_cond True --val_only True --resume $ckpt_path
If our work assists your research, feel free to give us a star β or cite us using:
@article{li2024controlvar,
title={ControlVAR: Exploring Controllable Visual Autoregressive Modeling},
author={Li, Xiang and Qiu, Kai and Chen, Hao and Kuen, Jason and Lin, Zhe and Singh, Rita and Raj, Bhiksha},
journal={arXiv preprint arXiv:2406.09750},
year={2024}
}