facebookresearch / MetaCLIP

ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Do not how what to do with datacard

lilygeorgescu opened this issue · comments

Hi,

Thank you for this nice work and for making it public!

I do not know what to obtain the curated dataset, meaning how to use the datacard to obtain the training data to start the training.

If anyone has any idea, please let me know.

Thanks in advance.

datacard is a new term we invented for training data distribution, so it's not the concrete dataset. You prob. need to follow the code in metaclip to do curation on CommonCrawl to get the full dataset.