Full per-sample metadata for the 400m and CC2.5B training sets
vishaal27 opened this issue · comments
Hi, thanks for your great work and releasing both the metadata entries and the trained CLIP model weights. I was wondering if it would be possible for you to release the per-sample metadata (url, text caption etc) for both the datasets you released models for (400m and CC2.5B)---similar to how the laion-2b-en and datacomp1b splits are released.
Please let me know if this is in the pipeline or if they are already released, please point me to them.
Thanks!
thx for your interest. We are working on that.