huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

Home Page:https://huggingface.co/docs/datasets

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Column order is nondeterministic when loading from JSON

albertvillanova opened this issue · comments

As reported by @meg-huggingface, the order of the JSON object keys is not preserved while loading a dataset from a JSON file with a list of objects.

For example, when loading a JSON files with a list of objects, each with the following ordered keys:

  • [ID, Language, Topic],

the resulting dataset may have columns:

  • [ID, Topic, Language], or
  • [Topic, Language, ID], or
  • [Topic, ID, Language],...

This issue is caused by the use of a Python set (which does not preserve the order):

keys = set().union(*[row.keys() for row in dataset])

introduced in