Broken dataset in LLM 04
romainfut-db opened this issue · comments
romainfut-db commented
In LLM 04 demo, we call imdb_ds = load_dataset("imdb")
as our fine-tuning dataset.
It looks like there was an update to this dataset, and this line will throw an error ExpectedMoreSplits: {'unsupervised'}
.
This can be fixed by forcing a re-install of the latest version of Hugging Face's datasets
library. However doing so breaks the code further down where it can't find the train
and validation
splits in the dataset object.
Marco Graziano commented
+1