load_dataset() should load all subsets, if no specific subset is specified
windmaple opened this issue · comments
Feature request
Currently load_dataset() is forcing users to specify a subset. Example
from datasets import load_dataset dataset = load_dataset("m-a-p/COIG-CQIA")
ValueError Traceback (most recent call last)
[<ipython-input-10-c0cb49385da6>](https://localhost:8080/#) in <cell line: 2>()
1 from datasets import load_dataset
----> 2 dataset = load_dataset("m-a-p/COIG-CQIA")
3 frames
[/usr/local/lib/python3.10/dist-packages/datasets/builder.py](https://localhost:8080/#) in _create_builder_config(self, config_name, custom_features, **config_kwargs)
582 if not config_kwargs:
583 example_of_usage = f"load_dataset('{self.dataset_name}', '{self.BUILDER_CONFIGS[0].name}')"
--> 584 raise ValueError(
585 "Config name is missing."
586 f"\nPlease pick one among the available configs: {list(self.builder_configs.keys())}"
ValueError: Config name is missing.
Please pick one among the available configs: ['chinese_traditional', 'coig_pc', 'exam', 'finance', 'douban', 'human_value', 'logi_qa', 'ruozhiba', 'segmentfault', 'wiki', 'wikihow', 'xhs', 'zhihu']
Example of usage:
`load_dataset('coig-cqia', 'chinese_traditional')`
This means a dataset cannot contain all the subsets at the same time. I guess one workaround is to manually specify the subset files like in here, which is clumsy.
Motivation
Ideally, if not subset is specified, the API should just try to load all subsets. This makes it much easier to handle datasets w/ subsets.
Your contribution
Not sure since I'm not familiar w/ the lib src.