huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

Home Page:https://huggingface.co/docs/datasets

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

load_dataset error

lion-ops opened this issue · comments

Describe the bug

Why does the program get stuck when I use load_dataset method, and it still gets stuck after loading for several hours? In fact, my json file is only 21m, and I can load it in one go using open('', 'r').

Steps to reproduce the bug

  1. pip install datasets==2.19.2
  2. from datasets import Dataset, DatasetDict, NamedSplit, Split, load_dataset
  3. data = load_dataset('json', data_files='train.json')

Expected behavior

It is able to load my json correctly

Environment info

datasets==2.19.2

Hi, @lion-ops.

In our Continuous Integration we have many tests on loading JSON files and all of them work properly.

Could you please share your "train.json" file, so that we can try to reproduce the issue you have?

Hi, @lion-ops.

In our Continuous Integration we have many tests on loading JSON files and all of them work properly.

Could you please share your "train.json" file, so that we can try to reproduce the issue you have?

Thank you for your reply. I can load it normally in another server. Is it possible that the disk of my server is a network disk in the LAN, so it will be downloaded from the LAN and get stuck?