facebookresearch / MetaCLIP

ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Full per-sample metadata for the 400m and CC2.5B training sets

vishaal27 opened this issue · comments

Hi, thanks for your great work and releasing both the metadata entries and the trained CLIP model weights. I was wondering if it would be possible for you to release the per-sample metadata (url, text caption etc) for both the datasets you released models for (400m and CC2.5B)---similar to how the laion-2b-en and datacomp1b splits are released.
Please let me know if this is in the pipeline or if they are already released, please point me to them.
Thanks!

thx for your interest. We are working on that.