yandex / YaLM-100B

Pretrained language model with 100B parameters

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Evaluation benchmarks (lm-eval-harness)

justheuristic opened this issue · comments

Thanks for the awesome work! (and a especially for choosing to make it freely available)

If you have time, please also consider running the evaluation benchmarks from lm-eval-harness
https://github.com/EleutherAI/lm-evaluation-harness

[despite it having a ton of different benchmarks, you only need to implement one interface, and it runs all benchmarks for you]

It is a more-or-less standard tool for benchmarking how well does your model perform on a range of tasks (generation, common sense, math, etc)

There's a huge bunch of tasks, so if you want to choose some initial set, consider taking the ones that gpt-J reports here https://huggingface.co/EleutherAI/gpt-j-6B#evaluation-results