huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

Home Page:https://huggingface.co/docs/datasets

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Convert polars DataFrame back to datasets

ljw20180420 opened this issue · comments

Feature request

This returns error.

from datasets import Dataset

dsdf = Dataset.from_dict({"x": [[1, 2], [3, 4, 5]], "y": ["a", "b"]})
Dataset.from_polars(dsdf.to_polars())

ValueError: Arrow type large_list<item: int64> does not have a datasets dtype equivalent.

Motivation

When datasets contain Sequence data type, it will be converted to Arrow type large_list. However, the reverse (from large_list to Sequence) does not work.

Your contribution

No