huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

Home Page:https://huggingface.co/docs/datasets

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

add `with_transform` and/or `set_transform` to IterableDataset

not-lain opened this issue · comments

Feature request

when working with a really large dataset it would save us a lot of time (and compute resources) to use either with_transform or the set_transform from the Dataset class instead of waiting for the entire dataset to map

Motivation

don't want to wait for a really long dataset to map, this would give IterableDataset an extra advantage over the Dataset class.
reducing time and resources

Your contribution

I am a little busy with my job search lately, but would post about this feature in my social media.
Apologies again (dad going to kick me out soon), if I ever have some free time I will contribute to making this a reality, but that's going to be hard
    / (┬┬﹏┬┬)\