Pretraining dataset

Question

Pretraining dataset

Jimuyangz opened this issue a year ago · comments

Hi guys, great work!
I try to run the pretraining code but it seems the pretraining data is missing. I wonder if you plan to release the pretraining data, i.e., 'PATH_TRAIN' in 'RegionCLIP_RN50.yaml'. Or how can I preprocess the COCO dataset to enable pretraining? Could you provide a pre-processing script? Thank you so much!

Yiwu Zhong · Answer 1 · Thu Aug 10 2023 14:24:50 GMT+0800 (China Standard Time)

@Jimuyangz Thanks for your interest in our work. As mentioned in #10, the pretraining code and configs have already been released. The pretraining data might not be provided in the near future due to company policy. Those datasets are all public and you can download and get it accepted by dataloader (e.g., the inputs are simply an image and a string representing a caption).