kbressem / medAlpaca

LLM finetuned for medical question answering

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cant Load File medical_meadow_small.json

davidlee1102 opened this issue · comments

Please check again on your training code sample.

from datasets import load_dataset

data = load_dataset("json",data_files="/kaggle/working/medAlpaca/medical_meadow_small.json")

---ERROR---
File /opt/conda/lib/python3.10/site-packages/datasets/packaged_modules/json/json.py:150, in Json._generate_tables(self, files)
145 except json.JSONDecodeError:
146 raise e
147 raise ValueError(
148 f"Not able to read records in the JSON file at {file}. "
149 f"You should probably indicate the field of the JSON file containing your records. "
--> 150 f"This JSON file contain the following fields: {str(list(dataset.keys()))}. "
151 f"Select the correct one and provide it as field='XXX' to the dataset loading method. "
152 ) from None
153 # Uncomment for debugging (will print the Arrow table size and elements)
154 # logger.warning(f"pa_table: {pa_table} num rows: {pa_table.num_rows}")
155 # logger.warning('\n'.join(str(pa_table.slice(i, 1).to_pydict()) for i in range(pa_table.num_rows)))
156 yield (file_idx, batch_idx), self._cast_classlabels(pa_table)

AttributeError: 'list' object has no attribute 'keys'

commented

Strange error. Which version of datasets are you using? I tried it with a recent version and it works.

image

Maybe you do not have enough space in the home directory? Try forcing the hugging face cache somewhere else with:

import os
os.environ["HF_HOME"] = "/path/to/your/cache"

I use the same dataset you have used, I have checked, and I think the error comes from the env on Google Colab and Kaggle, so would you mind trying it on Google Colab or Kaggle ?