Support for pathlib.Path in datasets 2.19.0
lamyiowce opened this issue · comments
Describe the bug
After the recent update of datasets
, Dataset.save_to_disk does not accept a pathlib.Path anymore. It was supported in 2.18.0 and previous versions. Is this intentional? Was it supported before only because of a Python dusk-typing miracle?
Steps to reproduce the bug
from datasets import Dataset
import pathlib
path = pathlib.Path("./my_out_path")
Dataset.from_dict(
{"text": ["hello world"], "label": [777], "split": ["train"]}
.save_to_disk(path)
This results in an error when using datasets 2.19:
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
File "/Users/jb/scratch/venv/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 1515, in save_to_disk
fs, _ = url_to_fs(dataset_path, **(storage_options or {}))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jb/scratch/venv/lib/python3.11/site-packages/fsspec/core.py", line 383, in url_to_fs
chain = _un_chain(url, kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jb/scratch/venv/lib/python3.11/site-packages/fsspec/core.py", line 323, in _un_chain
if "::" in path
^^^^^^^^^^^^
TypeError: argument of type 'PosixPath' is not iterable
Converting to str works, however.
Dataset.from_dict(
{"text": ["hello world"], "label": [777], "split": ["train"]}
).save_to_disk(str(path))
Expected behavior
My dataset gets saved to disk without an error.
Environment info
aiohttp==3.9.5
aiosignal==1.3.1
attrs==23.2.0
certifi==2024.2.2
charset-normalizer==3.3.2
datasets==2.19.0
dill==0.3.8
filelock==3.14.0
frozenlist==1.4.1
fsspec==2024.3.1
huggingface-hub==0.23.2
idna==3.7
multidict==6.0.5
multiprocess==0.70.16
numpy==1.26.4
packaging==24.0
pandas==2.2.2
pyarrow==16.1.0
pyarrow-hotfix==0.6
python-dateutil==2.9.0.post0
pytz==2024.1
PyYAML==6.0.1
requests==2.32.3
six==1.16.0
tqdm==4.66.4
typing_extensions==4.12.0
tzdata==2024.1
urllib3==2.2.1
xxhash==3.4.1
yarl==1.9.4