Need wide & small task for fast evaluation
UmerHA opened this issue · comments
Hi all,
I want to do a large-ish study on quanting methods and their effect on model performance. For this, I need an evaluation that's (i) "wide" (ie covers a broad set of tasks / topics) and small (so it's quick and cheap to run).
Iiuc, currently there is no task for that.
I suggest we add the dharma2 dataset (samples from 8 tasks, incl MMLU ; 300 examples in total); or alternatively big-bench lite (samples from 24 tasks).
I'll fork this repo and add dharma2. If there's interest, I'd be happy to submit a PR.
We always welcome more task PRs. Additionally, if there's a broad task that meets your needs you can use the --limit
flag