Zasder3 / train-CLIP

A PyTorch Lightning solution to training OpenAI's CLIP from scratch.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dataset structure

tarunn2799 opened this issue · comments

Hi I'm having a little trouble understanding the dataset structure that I should follow in order to be able to train with this package. Is it one parent folder, one folder containing images and one folder containing their text files?
If yes, what should these subfolders be named?

https://github.com/Zasder3/train-CLIP#training-with-our-datamodule- any folder name should work, the file names should be the same

Hey, so all images and text files should be in one single folder?

No, any subfolder

Does this work
data/images/p1.jpg
and
data/text/p1.txt

Hi I prepared my dataset in that structure and I ran the below command
python train.py --model_name RN50 --folder /data/depop/data_org/clip/data/ --batch_size 512 --gpus 1

I'm getting an AssertionError from the cosine_annealing_warmup package for the line
assert warmup_steps < first_cycle_steps

What's happening here? please help me out

Okay so in models/wrapper.py is the warmup_step hardcoded to 2000? My dataset currently is much smaller for the num_training_steps to be bigger than 2000.

Hi, the .txt file here contains the a text caption?
Lets say I have to create my pair of image and text caption, could you please tell me if assumption below is correct?

so if I have to Finetune the CLIP model on pair of images and captions then this would work?

  • data/images/1_german_sheperd.jpg

  • data/label/1_german_sheperd.txt

  • data/images/2_german_sheperd.jpg

  • data/label/2_german_sheperd.txt

where,

  • 1_german_sheperd.txt contains a caption like "A sleeping German shepherd Dog"
  • 2_german_sheperd.txt contains a caption like "An angry barking German shepherd Dog"

yes
I'm surprised how much this is confusing people

yes I'm surprised how much this is confusing people

Actually, creating a file per caption(or label) , didn't make much sense to me, hence the question.

@tarunn2799 Hi,I would like to know has this problem been solved.

Okay so in models/wrapper.py is the warmup_step hardcoded to 2000? My dataset currently is much smaller for the num_training_steps to be bigger than 2000.

Thanks for your time.

@tarunn2799 Hi,I would like to know has this problem been solved.

Okay so in models/wrapper.py is the warmup_step hardcoded to 2000? My dataset currently is much smaller for the num_training_steps to be bigger than 2000.

Thanks for your time.

Hi @bk-201jk, I faced the same issue and solved the issue thanks to @ymzhu19eee in the issue #20

@iremonur Thank you very much!And I want to know how many photo in your dataset. And how do you set up your directory structure? What is in txt, or are its contents in the title. I would appreciate it if I could see a set of data in your dataset!!

I'm planning to prepare a 100k dataset (image-text pairs) for fine-tuning, but first I wanted to see if the code would work by running it with only 3 image-text pairs. The folder structure is as follows:
train-CLIP/data/img/1.png
train-CLIP/data/caption/1.txt
And one of the texts: There is a car on the road.

@iremonur .Thank you very much. If you can run the code with only 3 image-text pairs, please tell me .Thanks again!!