pass@1 on mbpp

Question

pass@1 on mbpp

chenkehua opened this issue a year ago · comments

The reproduced pass@1 result of StarCoder on the MBPP dataset is 43.6*, which differs from the reported result of 52.7 in the paper. Can you explain that？

ChiYeung Law · Answer 1 · Tue Jul 11 2023 14:23:59 GMT+0800 (China Standard Time)

The 43.6 score is evaluated on Google's MBPP with 500 problems. Our WizardCoder is also evaluated on the same data.
The 52.7 is evaluated on MultiPL-E's MBPP (397 problems).

Weiliang Zeng · Answer 2 · Wed Jul 26 2023 08:16:04 GMT+0800 (China Standard Time)

Thanks for the clarification here. Very helpful!
@ChiYeungLaw , could you explain a bit more about how did you get the 43.6 for Starcoder? Is that based on the Eval Harness or mbpp_gen.py in your repo? Could you provide the command line for reproduction purposes?

ChiYeung Law · Answer 3 · Wed Jul 26 2023 10:30:52 GMT+0800 (China Standard Time)

We follow the same prompt as Eval Harness to evaluate StarCoder on MBPP.

ammuntasirrahman · Answer 4 · Thu Jul 27 2023 05:35:43 GMT+0800 (China Standard Time)

Is it prompt = f'"""\n{description}\n{test_example}\n"""\n' or do you include the code_solution?

lihaoran · Answer 5 · Fri Aug 04 2023 17:00:21 GMT+0800 (China Standard Time)

@ChiYeungLaw May I ask why replacing 4 space chars with a tab before generation?