huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

Home Page:https://huggingface.co/docs/datasets

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Method to load Laion400m

humanely opened this issue · comments

Feature request

Large datasets like Laion400m are provided as embeddings. The provided methods in load_dataset are not straightforward for loading embedding files, i.e. img_emb_XX.npy ; XX = 0 to 99

Motivation

The trial and experimentation is the key pivot of HF. It would be great if HF can load embeddings files s,ealessly.

Your contribution

I cam write the loader with some help.