ExpectedMoreSplits error when using data_dir
albertvillanova opened this issue · comments
As reported by @regisss, an ExpectedMoreSplits
error is raised when passing data_dir
:
from datasets import load_dataset
dataset = load_dataset(
"lvwerra/stack-exchange-paired",
split="train",
cache_dir=None,
data_dir="data/rl",
)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 2609, in load_dataset
builder_instance.download_and_prepare(
File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1027, in download_and_prepare
self._download_and_prepare(
File "/usr/local/lib/python3.10/dist-packages/datasets/builder.py", line 1140, in _download_and_prepare
verify_splits(self.info.splits, split_dict)
File "/usr/local/lib/python3.10/dist-packages/datasets/utils/info_utils.py", line 92, in verify_splits
raise ExpectedMoreSplits(str(set(expected_splits) - set(recorded_splits)))
datasets.utils.info_utils.ExpectedMoreSplits: {'test'}