rom1504 / laion-prepro

Get hundred of million of image+url from the crawling at home dataset and preprocess them

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

make listing and download of csv files faster and cleaner

rom1504 opened this issue · comments

it is done currently by bash + aria2c
it takes about 30min
Could easily take only 5min if done with the same techniques as img2dataset