VAR: a new visual generation method elevates GPT-style models beyond diffusion🚀 & Scaling laws observed📈

This is the official PyTorch implementation of Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction.

NOTE: Mark your calendars📅! Our code will be ready before 9:00 AM UTC on 4/4/2024. Feel free to star ⭐ or watch 👓 for the latest updates🤗!

What's New?

🔥 Introducing VAR: a new paradigm in autoregressive visual generation✨:

Visual Autoregressive Modeling (VAR) redefines the autoregressive learning on images as coarse-to-fine "next-scale prediction" or "next-resolution prediction", diverging from the standard raster-scan "next-token prediction".

🔥 For the first time, GPT-style autoregressive models surpass diffusion models🚀:

🔥 Discovering power-law Scaling Laws in VAR transformers📈:

🔥 Zero-shot generalizability🛠️:

For a deep dive into our analyses, discussions, and evaluations, check out our paper.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If our work assists your research, feel free to give us a star ⭐ or cite us using:

@Article{VAR,
      title={Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction}, 
      author={Keyu Tian and Yi Jiang and Zehuan Yuan and Bingyue Peng and Liwei Wang},
      year={2024},
      eprint={2404.02905},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

About

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction"

MIT License

Languages

Language:Python 100.0%