alinlab / s-clip

S-CLIP: Semi-supervised Vision-Language Pre-training using Few Specialist Captions

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

S-CLIP: Semi-supervised CLIP

Implementation of the S-CLIP algorithm described in "S-CLIP: Semi-supervised Vision-Language Pre-training using Few Specialist Captions."

S-CLIP improves the training of CLIP in scenarios where only a few image-text pairs are available by incorporating unpaired images alongside the image-text pairs.

Motivation

S-CLIP addresses the issue of naive pseudo-labeling in semi-supervised CLIP. plot

Method overview

S-CLIP introduces caption-level and keyword-level pseudo-labeling approaches. plot plot

Installation

Our code is based on the open_clip library. The original CLIP training code can be found in the training directory, while our newly developed code is located in the custom directory. The main logic of the proposed training loss is implemented in custom/loss.py.

Install the requirements.

pip install -r requirements.txt

Training

Run ./scripts/train_RS.sh.

Evaluation

Run ./scripts/eval_RS.sh [CKPT-NAME].

About

S-CLIP: Semi-supervised Vision-Language Pre-training using Few Specialist Captions


Languages

Language:Python 98.9%Language:Shell 1.1%