KimRass / CLIP

PyTorch implementation of 'CLIP' (Radford et al., 2021) from scratch and training it on Flickr8k + Flickr30k

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

'CLIP' (Radford et al., 2021) implementation from scratch in PyTorch

Pretrained Model

Linear Classification on ImageNet1k (mini) Dataset

# e.g.,
python3 linear_classification.py\
    --ckpt_path="../clip_flickr.pth"\
    --data_dir="../imagenet-mini/"\
    --n_epochs=64\
    --batch_size=128\
    --n_cpus=4 # Optional
  • Top-5 accuracy on validation set: 5.8%

Zero-shot Classification on ImageNet1k (mini) Dataset

# e.g.,
python3 zero_shot_classification.py\
    --ckpt_path="../clip_flickr.pth"\
    --data_dir="../imagenet-mini/"\
    --batch_size=16\
    --n_cpus=4\ # Optional
    --max_len=128\ # Optional
    --k=10 # Optional
  • Top-10 accuracy on train + validation set: 3.0%

Implementation Details

  • Temperature와 관련한 부분은 구현하지 않았습니다.
    • "The learnable temperature parameter was clipped to prevent scaling the logits by more than 100 which we found necessary to prevent training instability."

About

PyTorch implementation of 'CLIP' (Radford et al., 2021) from scratch and training it on Flickr8k + Flickr30k


Languages

Language:Python 99.9%Language:Shell 0.1%