databricks-academy / large-language-models

Notebooks for Large Language Models (LLMs) Specialization

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Broken dataset in LLM 04

romainfut-db opened this issue · comments

In LLM 04 demo, we call imdb_ds = load_dataset("imdb") as our fine-tuning dataset.
It looks like there was an update to this dataset, and this line will throw an error ExpectedMoreSplits: {'unsupervised'}.

This can be fixed by forcing a re-install of the latest version of Hugging Face's datasets library. However doing so breaks the code further down where it can't find the train and validation splits in the dataset object.