OpenLMLab / LEval

[ACL'24 Outstanding] Data and code for L-Eval, a comprehensive long context language models evaluation benchmark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Except GSM100, other datasets are evaluated in 0-shot?

zhimin-z opened this issue · comments

I hope to confirm the leaderboard configuration.

Yes, we did not add examples for other tasks. We suggest modifying based on our inference code under the Baselines folder. The leaderboard leaderboard has not been updated please refer to our paper for the up-to-date results!