Stability-AI/lm-evaluation-harness Issues
llama2 70B cause OOM
Updated 4JSQuAD results of LLaMA 2 models
Closed 9xwinograd is missing
Closed 2Clarification about JSQuAD
Closed 1
A framework for few-shot evaluation of autoregressive language models.