THUDM / AlignBench

大模型多维度中文对齐评测基准 (ACL 2024)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Errors in reference (especially for reasoning)

tnlin opened this issue · comments

Dear Author,

Thank you for your effort in building this useful benchmark. However, there are several inaccuracies in the reference answers that require attention, particularly in the reasoning.

These issues need to be addressed or inspected manually with greater care.
For instance, the reference in line 315 (南京博物院的展品中) contains an error:

Incorrect reference:

⑹在公元60年去世的不是东汉渔阳营的百人将。
Reference: ... 墓碑D的主人是百人将窦青,去世时间是公元60年。 (wrong)

Corrected version:

墓碑A的主人是赵宣县尉,去世时间是公元96年,
墓碑B的主人是酒商张通,去世时间是公元60年,     
墓碑C的主人是儒家学者董玄,去世时间是公元84年,
墓碑D的主人是百人将窦青,去世时间是公元72年。

Additionally, the reference in line 336 (有一个小偷费劲力气进入到了银行的金库里...) contains a significant mathematical error:

Incorrect reference:

If all coins are fake, the total weight is (1+2+3+...100) = 5050.

Corrected version:

It should be 100 * (1+2+3+...100) = 100 * 5050 = 505000.

I hope these corrections are helpful and that the benchmark can be improved accordingly.

commented

@tnlin Hi,

Thanks for your great suggestion! The error has been fixed in the mentioned commit.