iGPT-S pretrained on CIFAR10. Completions are fairly poor as the model was only trained on CIFAR10, not all of ImageNet.
- Batched k-means on GPU for quantization of larger datasets (currently using
- BERT-style pretraining (currently only generative is supported.)
- Load pretrained models from OpenAI.
- Reproduce at least iGPT-S results.
According to their blog post, the largest model, iGPT-L (1.4 M parameters), was trained for 2500 V100-days. By greatly reducing the number of attention head, number of layers, and input size (which effects model size quadratically), we can train our own model (26 K parameters) on Fashion-MNIST on a single NVIDIA 2070 in less than 2 hours.
- Image GPT
Some pre-trained models are located in
models directory. Run
to download the
cifar10 pretrained iGPT-S model.
Images are downloaded, and centroids are computed using k-means with
num_clusters clusters. These centroids are used to quantize the images before
they are fed into the model.
# options: mnist, fmnist, cifar10 python src/compute_centroids.py --dataset mnist --num_clusters=8 # creates data/<dataset>_centroids.npy
Note: Use the same
num_vocab in your model.
Models can be trained using
src/run.py with the
Models can be pretrained by specifying a dataset and model config.
configs/s_gen.yml corresponds to iGPT-S from the paper,
is an extra small model for trying on toy datasets with limited compute.
python src/run.py --dataset mnist train configs/xxs_gen.yml
Pre-trained models can be fine-tuned by passing the path to the pre-trained
--pretrained, along with the config file and dataset.
python src/run.py --dataset mnist train configs/xxs_clf.yml --pretrained=models/mnist_gen.ckpt`
Figures like those seen above can be created using random images from test set:
# outputs to figure.png python src/sample.py models/mnist_gen.ckpt
Gifs like the one seen in my tweet can be made like so:
# outputs to out.gif python src/gif.py models/mnist_gen.ckpt