Beckschen / ViTamin

[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Datacomp dataset download

mactavish91 opened this issue · comments

Hello,

I am attempting to download the complete datacomp dataset but have encountered significant difficulties. As of now, more than a third of the links I’ve attempted to access are invalid. I’m curious to know how others have successfully downloaded the entire dataset. Could you share the method you used? I greatly appreciate any guidance and look forward to your response.

Hello, and thank you for expressing your interest!

I share similar concerns as you do. We downloaded the dataset around May 2023, achieving a download success rate of approximately 95% to 97%. I am not sure what is the success rate for now.
The Datacomp-1B dataset card can be also found here: https://huggingface.co/datasets/mlfoundations/datacomp_1b

Best
Jieneng