open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Home Page:https://opencompass.org.cn/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug] 评测C-Eval数据集时,选择test目录下测试数据,但是最后得分非常低,接近于0分

13416157913 opened this issue · comments

Prerequisite

Type

I'm evaluating with the officially supported tasks/models/datasets.

Environment

1

Reproduces the problem - code/configuration sample

1

Reproduces the problem - command or script

1

Reproduces the problem - error message

1

Other information

评测C-Eval数据集时,选择test目录下测试数据,但是最后得分非常低,接近于0分
请问是不是C-Eval数据集评测时,是不是需要自己根据模型的回答,将回答拿到C-Eval官网上计算分数?

Please submit the results on C-Eval official website for test acc. We only have answer of the val set.

Please submit the results on C-Eval official website for test acc. We only have answer of the val set.

Thanks for your reply.