Zasder3 / train-CLIP

A PyTorch Lightning solution to training OpenAI's CLIP from scratch.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

WebDataset support

rom1504 opened this issue · comments

I think it could be pretty useful to add a webdataset loader to this, so webdataset datasets can be used here.
This is relevant as large webdataset are starting to be available (one is crawling at home of size 400M)

I think https://github.com/lucidrains/DALLE-pytorch/pull/280/files may be a good example on how to do it

oh I see this repo https://github.com/mlfoundations/open_clip#yfcc-and-other-datasets has support
it might be another example

I think this would be a helpful addition to the repo, however, my main short-term focus is a collaboration with the team behind that repo.

If you or anyone else reading is interested in seeing this addition to the repo I'd be glad to accept a PR!