haonan-li/CMMLU Issues
关于测评生成的结果
Updated 1希望增加对于Qwen2的测试
Updated希望增加对于Grok-1的测试
Closed 1数据集怎么回事
Closed 1请问ChatGLM3有测试结果吗
Closed 1支持yi-34b-chat吗?
Closed 1如果用评测集进行训练,是不是可以拿满分,如何防止作弊?
Closed 1外部API接口的输入/输出格式和邮箱地址
Closed 6每个 csv 文件具体属于哪个 category
Closed 1category以及总体average得分的计算逻辑
Closed 2cmmlu测试集结果更新
Closed 3SyntaxError: unmatched ')'
Closed 1容易卡主,咋回事
Closed 3请问一下,如果想提交模型结果,更新到榜单上,需要怎么操作?
Closed 1CMMLU测试
Closed 3支持llama2吗?
Closed 1请问一下,MILM的测试是如何进行的?
Closed 1Support Qwen-7b
Closed 2提示-评估中的链接失效
Closed 1Baichuan-13B-Chat
Closed 1get_results出来的分数有一定随机性
Closed 2【数据错误】huggingface 上的数据加载有一个错误
Closed 1【baichuan-13】可否对比下百川13B的模型,近日发布的
Closed 1logo扇面上没有“world history”世界历史这一主题
Closed 1是否考虑使用四个选项的概率大小来评估模型?
Closed 2