jeinlee1991 / chinese-llm-benchmark

中文大模型能力评测榜单：覆盖百度文心一言、chatgpt、阿里通义千问、讯飞星火、belle / chatglm6b 等开源大模型，多维度能力评测。不仅提供能力评分排行榜，也提供所有模型的原始输出结果！

jeinlee1991/chinese-llm-benchmark Issues

评测一下 deepseek v2
Updated 2 months ago
评测数据无法吐槽
Updated 3 months ago
评测数据太少了吧，这能说明问题？
Updated 3 months ago1
缺少重要的claude系列，申请加入相关测评
Updated 3 months ago2
10b以下开源排名榜单不靠谱
Updated 3 months ago
建议增加1B模型测试
Closed 3 months ago1
能否加入Function Call（工具调用）能力指标评测
Updated 3 months ago1
请问为什么没有bing？
Closed 3 months ago1
eval中是所有评测数据吗
Closed 3 months ago1
通义千问的评测时间？
Closed 3 months ago
很棒的测评，请问项目主测试数据可以转载吗
Closed 3 months ago1
可以评测一下千问-7B模型吗
Closed 3 months ago
强烈建议加入moonshot的Kimi chat！！！
Closed 3 months ago2
文心一言的新版本复测
Closed 3 months ago1
可以测试一下openbuddy-deepseek-67b-v15.2
Closed 3 months ago1
为什么千问1.5-14B-chat分这么高，比72b还高？
Closed 3 months ago4
讯飞星火推出3.5版本
Closed 3 months ago1
可否将kimi chat加入榜单
Closed 3 months ago1
能否加入qianwen1.5-32B的评测
Closed 3 months ago2
讯飞星火13B开源模型测评
Updated 4 months ago
可否增加claude3商用模型的评测
Updated 4 months ago
为什么千问1.5-14B-chat分这么高，比72b还高？
Closed 4 months ago
Is there any arxiv paper or report for this benchmark?
Updated 6 months ago
update new model
Updated 6 months ago
Why does data analysis evaluation not count into the overall score?
Updated 6 months ago
What is the evaluation criteria for the score?
Updated 7 months ago
希望能够增加RWKV模型进行评测
Updated 7 months ago2
This link does not redirect...
Updated 8 months ago
我Claude呢？
Updated 10 months ago
How should I cite this work?
Updated 10 months ago
很棒的工作，请问评分标准是怎么样的呢？是如何给这些模型打分的？
Updated 10 months ago6
如果有各个模型的部署硬件要求对比就好了
Updated 10 months ago
可以评测一下Chinese-LLaMA-Alpaca-2吗
Updated a year ago
很好的工作，不知道未来有将Anima-30B模型列入评测计划么？
Updated a year ago
如何提交自己的模型进行评测？
Updated a year ago1
提供结果复现代码
Updated a year ago