huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

Home Page:https://huggingface.co/docs/datasets

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ExpectedMoreSplits error when using data_dir

albertvillanova opened this issue · comments

As reported by @regisss, an ExpectedMoreSplits error is raised when passing data_dir:

from datasets import load_dataset

dataset = load_dataset(
    "lvwerra/stack-exchange-paired",
    split="train",
    cache_dir=None,
    data_dir="data/rl",
)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 2609, in load_dataset
    builder_instance.download_and_prepare(
  File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1027, in download_and_prepare
    self._download_and_prepare(
  File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1140, in _download_and_prepare
    verify_splits(self.info.splits, split_dict)
  File "/usr/local/lib/python3.10/dist-packages/datasets/utils/info_utils.py", line 92, in verify_splits
    raise ExpectedMoreSplits(str(set(expected_splits) - set(recorded_splits)))
datasets.utils.info_utils.ExpectedMoreSplits: {'test'}