Stability-AI / lm-evaluation-harness

A framework for few-shot evaluation of autoregressive language models.

Stability-AI/lm-evaluation-harness Issues

Please consider providing bibliographic information
Updated 7 months ago1
llama2 70B cause OOM
Updated 7 months ago4
JSQuAD results of LLaMA 2 models
Closed 7 months ago9
xwinograd is missing
Closed 9 months ago2
Clarification about JSQuAD
Closed 9 months ago1
Prompt versions of non-instruction-tuned LLaMA models
Updated 9 months ago1