Need wide & small task for fast evaluation

Question

Need wide & small task for fast evaluation

UmerHA opened this issue 2 months ago · comments

Hi all,

I want to do a large-ish study on quanting methods and their effect on model performance. For this, I need an evaluation that's (i) "wide" (ie covers a broad set of tasks / topics) and small (so it's quick and cheap to run).

Iiuc, currently there is no task for that.

I suggest we add the dharma2 dataset (samples from 8 tasks, incl MMLU ; 300 examples in total); or alternatively big-bench lite (samples from 24 tasks).

I'll fork this repo and add dharma2. If there's interest, I'd be happy to submit a PR.

Stella Biderman · Answer 1 · Fri Apr 26 2024 21:28:17 GMT+0800 (China Standard Time)

We always welcome more task PRs. Additionally, if there's a broad task that meets your needs you can use the --limit flag