This repository is about finetuning CLIP and zero-shot classification via pre-trained CLIP.
Before you begin, ensure you have met the following requirements:
- Python 3.6 or later
- PyTorch 1.7.1 or later
- transformers and clip libraries installed
- Colab Pro V100 GPU or an equivalent GPU while fine-tuning CLIP
- Wandb(NOT NECESSARY, it is used to track losses of text and image.)
In this project, you can download Indo fashion dataset.
The dataset is split into 3 section, that is, training, validation, and test. The total number of categories is 15. Each class is one kind of indo fashion categories.
Training | Validation | Test | Total |
---|---|---|---|
91K | 7.5K | 7.5K | 106K |
Epochs | Batch sizes | Optimizer | Loss Function |
---|---|---|---|
30 | 256 | Adam | Cross Entropy |
The hyperparameters used in Adam optimizer are below:
learning rate | $B_1 | $B_2 | Weight Decay |
---|---|---|---|
5e-5 | 0.9 | 0.98 | 0.2 |
Inference code: CLIP
Baseline Training code: #83