EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

Home Page:https://www.eleuther.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support loading slices of a split from a dataset

alexrs opened this issue · comments

What

Hugging Face Datasets support Slice Splits

datase_10pc = datasets.load_dataset("mydataset", split="test[:10%]")

Therefore I assumed that when creating a new task, I could express the dataset split as:

task: mytask
dataset_path: user/mydataset
dataset_name: null
training_split: null
validation_split: null
test_split: 'test[:50%]'
doc_to_text: abc
doc_to_target: def
metric_list:
  - metric: ...

However, that fails in:

return self.dataset[self.config.training_split]

with KeyError: 'test[:50%]'