ngthanhtin / TSVLC

Repository for the paper: Teaching Structured Vision & Language Concepts to Vision & Language Models

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Teaching Structured Vision & Language Concepts to Vision & Language Models

This repository contains the code for the paper "Teaching Structured Vision & Language Concepts to Vision & Language Models" (link), by Sivan Doveh et al, published at CVPR 2023.

A model checkpoint for models trained with [LLM,RB] negatives, and a zip file of the generated positives can be downloaded from this Google Drive link: :



  1. Linux machine
  2. At least one NVIDIA GPU
  3. At least CUDA 10.2
  4. Anaconda (Installation instructions:

Install Dependencies

To install the required dependencies, first clone the repository and navigate to the cloned directory:

git clone TSVLC  

Next, create and activate the conda environment:

conda deactivate # deactivate any active environments
conda create -n vl python=3.8.13 # install the conda environment with conda dependencies
conda activate vl # activate the environment
conda install -c conda-forge libjpeg-turbo
conda install pytorch==1.12.1 torchvision==0.13.1 cudatoolkit=11.3.1 -c pytorch

Data Preperations

Training data

Download Conceptual Captions 3M training and validation splits from
After data preperation, place the data in TSVLC/CC3M_data/training and TSVLC/CC3M_data/validation

Train with Positives

Download the positives from and place them in TSVLC/CC3M_positives/

Evaluation data

Prepare vl checklist dataset as describe in
Then move the vl dataset to TSVLC/vl_datasets/
If you followd the instructions correctly you should have the following folders inside vl_datasets: 'hake', 'swig', 'vg'.


Run the training script

First, navigate to the src directory:

cd src

The model will be saved in TSVLC/Outputs/exp_name/checkpoints

To train a network with:

  • RB negative generation:
python3 training/ --name exp_name --vl_negs --lora 4 --neg_type rule_based --pretrained openai
  • RB + llm based negatives generation:
python3 training/ --name exp_name --vl_negs --lora 4 --neg_type both --llm_neg_types NOUN ADP ADJ VERB --pretrained openai
  • Positives:
python3 training/ --name exp_name --vl_pos --lora 4 --pretrained openai


Run the evaluation script

All vl_checklist jsons will be saved in TSVLC/eval_jsons/clip/exp_name/ and the result will be printed. To prepare the vl checklist evaluate results for the experiment exp_name run the following command:

python3 training/  --lora 4 --pretrained openai --eval_vl_cklist --eval_only --resume /path/to/checkpoint


Repository for the paper: Teaching Structured Vision & Language Concepts to Vision & Language Models


Language:Python 100.0%