baaivision/JudgeLM Issues
训练结果复现不符合预期
UpdatedLeaderboard of JudgLM evaluations
Updated 2关于中文问答的自动评测
Closed 2您好 你们在训练33b的时候用了多少资源呀
Closed 2你好,请教下如何同时给多个答案打分呢?看目前的代码好像还不支持?
Closed 3preprocess
Closed 6
An open-sourced LLM judge for evaluating LLM-generated answers.