Paper Link👁️
模型 |
开源 |
中文推理 |
中文语言 |
总分 |
GPT-4-1106-preview |
- |
7.73 |
8.29 |
8.01 |
DeepSeek-V2-Chat(RL) |
√ |
7.45 |
8.36 |
7.91 |
erniebot-4.0-202404 (文心一言) |
- |
7.61 |
8.17 |
7.89 |
DeepSeek-V2-Chat(SFT) |
√ |
7.30 |
8.17 |
7.74 |
GPT-4-0613 |
- |
7.47 |
7.59 |
7.53 |
erniebot-4.0-202312 (文心一言) |
- |
6.84 |
7.88 |
7.36 |
moonshot-v1-32k-202404 (月之暗面) |
- |
6.42 |
8.02 |
7.22 |
Qwen1.5-72B-Chat (通义千问) |
√ |
6.45 |
7.93 |
7.19 |
DeepSeek-67B-Chat |
√ |
5.75 |
7.11 |
6.43 |
Yi-34B-Chat (零一万物) |
√ |
4.86 |
7.38 |
6.12 |
GPT-3.5-turbo-0613 |
- |
5.35 |
6.71 |
6.08 |
小模型 |
开源 |
中文推理 |
中文语言 |
英文 |
编码 |
Yi-1.5-9B |
√ |
|
|
|
|
Yi-1.5-6B |
√ |
|
|
|
|
Model |
English |
Chinese |
Code |
Math |
Params |
Context |
DeepSeek-V2-Chat(RL) |
157.5 |
159.6 |
185.6 |
146.1 |
|
|
DeepSeek-V2-Chat(SFT) |
159.7 |
163.3 |
175.9 |
143.5 |
|
|
LLaMA3-70B-Instruct |
160.4 |
138.6 |
176.5 |
141.7 |
|
|
Mixtral-8x22B |
156.2 |
121.0 |
164.4 |
137.7 |
44/176 |
|
QWen1.5-72B-Chat |
142.1 |
165.1 |
140.9 |
122.5 |
|
|
DeepSeek-V2(MoE-236B) |
157.4 |
165.7 |
115.4 |
122.8 |
|
128k |
DeepSeek-V1-Chat(SFT) |
142.8 |
133.0 |
153.5 |
116.7 |
|
|
LLaMA3-70B |
159.9 |
136.8 |
116.8 |
125.2 |
|
|
Mixtral-8x7B |
|
|
|
|
13/56 |
|
DeepSeek-V1(Dense-67B) |
139.9 |
136.9 |
102.5 |
82.1 |
|
|
DeepSeek-V2-Lite-Chat |
|
|
|
|
2.4/15.7 |
32K |
Arctic-128×3.66B(MoE-480B) |
|
|
|
|
17/480 |
|
English Domain |
MMLU |
BBH |
Total |
Claude-3-Opus |
86.8%(5-shot) |
86.8%(3-shot) |
|
LLaMA3-70B-Instruct |
80.3 |
80.1 |
160.4 |
LLaMA3-70B |
78.9 |
81.0 |
159.9 |
DeepSeek-V2-Chat(SFT) |
78.4 |
81.3 |
159.7 |
DeepSeek-V2-Chat(RL) |
77.8 |
79.7 |
157.5 |
DeepSeek-V2(MoE-236B) |
78.5 |
78.9 |
157.4 |
Mixtral-8x22B |
77.6 |
78.9 |
156.5 |
Mixtral-8x7B |
70.4 |
|
|
DeepSeek-V1 Chat(SFT) |
71.1 |
71.7 |
142.8 |
QWen1.5-72B-Chat |
77.3 |
65.9 |
142.1 |
Yi-1.5-34B-Chat |
76.8 |
|
|
Yi-1.5-9B-Chat |
69.5 |
72.4 |
|
Yi-1.5-6B-Chat |
63.5 |
59.0 |
|
QWen1.5-32B-Chat |
74.3 |
|
|
Mixtral-8x7B-Instruct-v0.1 |
71.4 |
|
|
Mixtral-8x22B-Instruct-v0.1 |
77.7 |
|
|
DeepSeek-V1(Dense-67B) |
71.3 |
68.7 |
139.0 |
GPT-4 |
86.4 |
86.7 |
|
DeepSeek-V2-Lite-Chat |
55.7 |
48.1 |
|
DeepSeekMoE-16B-Chat |
47.2 |
42.2 |
|
DeepSeek-7B-Chat |
49.7 |
43.1 |
|
Arctic-128×3.66B(MoE-480B) |
67.3? |
|
|
Chinese Domain |
C-Eval |
CMMLU |
CLUEWSC |
DeepSeek-V2 (MoE-236B) |
81.7 |
84.0 |
|
QWen1.5-72B-Chat |
82.2 |
82.9 |
|
DeepSeek-V2-Chat(SFT) |
80.9 |
82.4 |
|
DeepSeek-V2-Chat(RL) |
78.0 |
81.6 |
|
LLaMA3-70B-Instruct |
67.9 |
70.7 |
|
DeepSeek-V1(Dense-67B) |
66.1 |
70.8 |
|
LLaMA3-70B |
67.5 |
69.3 |
|
DeepSeek-V1-Chat(SFT) |
65.2 |
67.8 |
|
Mixtral-8x22B |
60.0 |
61.0 |
|
GPT-4 |
69.9 |
71.0 |
|
QWen-14B-Chat |
71.7 |
70.0 |
|
Yi-34B-Chat |
77.71 |
73.52 |
|
QWen1.5-7B-Chat |
|
73.4 |
|
Yi-1.5-9B |
|
74.8 |
|
Yi-1.5-6B |
|
70.8 |
|
DeepSeek-V2-Lite-Chat |
60.1 |
62.5 |
80.0 |
DeepSeekMoE-16B-Chat |
40.0 |
49.3 |
68.2 |
DeepSeek-7B-Chat |
44.7 |
51.2 |
66.2 |
Code Domain |
HumanEval |
MBPP |
LiveCodeBench(0901-0401) |
MT-Bench |
Claude-3-Opus |
84.9%(0-shot) |
|
|
|
DeepSeek-V2-Chat(RL) |
81.1 |
72.0 |
32.5 |
|
LLaMA3 70B Instruct |
76.2 |
69.8 |
30.5 |
|
DeepSeek-V2-Chat(SFT) |
76.8 |
70.4 |
28.7 |
|
Yi-1.5-34B-Chat |
75.2 |
74.6 |
|
8.5 |
Mixtral-8x22B |
75.0 |
64.4 |
25.0 |
|
DeepSeek-V1-Chat(SFT) |
73.8 |
61.4 |
18.3 |
|
QWen1.5-72B-Chat |
64.6 |
72.5 |
18.8 |
8.61 |
LLaMA3-70B |
48.2 |
68.6 |
|
|
DeepSeek-V2(MoE-236B) |
48.8 |
66.6 |
|
|
Yi-1.5-9B-Chat |
66.5 |
78.8 |
|
8.2 |
Yi-1.5-6B-Chat |
64.0 |
70.9 |
|
7.5 |
LLaMA3-8B-Instruct |
61.6 |
61.4 |
|
8.0 |
DeepSeek-V1(Dense-67B) |
45.1 |
57.4 |
|
|
QWen1.5-32B-Chat |
51.2 |
66.9 |
|
8.3 |
QWen1.5-14B-Chat |
|
|
|
7.91 |
Mixtral-8x7B-Instruct-v0.1 |
45.1 |
59.5 |
|
8.3 |
Mixtral-8x22B-Instruct-v0.1 |
76.2 |
73.8 |
|
8.6 |
QWen1.5-7B-Chat |
36.0 |
46.1 |
|
7.60 |
Yi-1.5-9B |
41.4 |
61.1 |
|
|
Yi-1.5-6B |
36.5 |
56.8 |
|
|
DeepSeek-V2-Lite-Chat |
57.3 |
45.8 |
|
|
DeepSeekMoE-16B-Chat |
45.7 |
46.2 |
|
|
DeepSeek-7B-Chat |
45.1 |
39.0 |
|
|
HumanEval |
Pass@1 |
Pass@10 |
0-shot |
5-shot |
Claude-3-Opus |
|
|
84.9% |
|
StarCoder2-15B |
|
|
|
|
StarCoder2-7B |
|
|
|
|
StarCoder2-3B |
|
|
|
|
LLaMA3-70B |
|
|
81.7 |
|
LLaMA3-8B |
|
|
62.2 |
|
Yi-Chat-34B |
7.9 |
|
|
|
QWen-14B-Chat |
11.1 |
|
|
|
DeepSeek-Coder-33B-Instruct |
31.7 |
|
|
|
GPT-4-Turbo |
48.4 |
|
|
|
Math Domain |
GSM8K |
MATH |
CMath |
Claude-3-Opus |
95.0%(0-shot) |
60.1%(0-shot) |
|
DeepSeek-V2 Chat (RL) |
92.2 |
53.9 |
|
DeepSeek-V2 Chat (SFT) |
90.8 |
52.7 |
|
LLaMA3-70B Instruct |
93.2 |
48.5 |
|
Mixtral-8x22B |
87.9 |
49.8 |
|
LLaMA3-70B |
83.0 |
42.2 |
|
DeepSeek-V2 (MoE-236B) |
79.2 |
43.6 |
|
QWen1.5-72B-Chat |
86.0 |
44.4 |
|
DeepSeek-V1 Chat (SFT) |
84.1 |
32.6 |
|
DeepSeek-V1 (Dense-67B) |
63.4 |
18.7 |
|
Yi-1.5-34B-Chat |
90.2 |
50.1 |
|
QWen1.5-32B-Chat |
83.9 |
43.3 |
|
Mixtral-8x7B-Instruct-v0.1 |
65.7 |
28.4 |
|
Mixtral-8x22B-Instruct-v0.1 |
84.0 |
41.1 |
|
QWen1.5-7B-Chat |
70.1 |
20.3 |
|
LLaMA3-70B |
54.7 |
21.16 |
|
Yi-1.5-9B |
73.7 |
32.6 |
|
Yi-1.5-6B |
62.2 |
28.42 |
|
DeepSeek-7B-Chat |
62.6 |
14.7 |
66.4 |
DeepSeekMoE-16B-Chat |
62.2 |
15.2 |
67.9 |
DeepSeek-V2-Lite-Chat |
72.0 |
27.9 |
71.7 |
Arctic-128×3.66B(MoE-480B) |
74.2 |
|
|