How to train with laion400m data in Stage1?
pkulwj1994 opened this issue · comments
Hi Runpei,
Great appreciation for your work. I am trying to test the stage-1 training, but I find that the Laion400m data is a little bit confusing. My issue is how I can use the Laion400m data for training, could you please give a clear instruction? Thank you!
The original code for the definition of the dataset is in the following. I don't know where to get the "data/resources/laion400m_origin20m_shard_list.json" file
source code:
L(WebDatasetInfo)( name="laion400m_orig", description="The length and width of the image are the original size, but only 20M was downloaded.", dataset_type=DatasetType.ImageTextPair, cls=UnifiedITPairWebdataset, approx_size="20M", shard_list_path="data/resources/laion400m_origin20m_shard_list.json", ),
Best wishes.
Hi @pkulwj1994,
Please refer to this issue