nlpxucan / WizardLM

LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pass@1 on mbpp

chenkehua opened this issue · comments

The reproduced pass@1 result of StarCoder on the MBPP dataset is 43.6*, which differs from the reported result of 52.7 in the paper. Can you explain that?

dd2a8500f82222b3e475485f987ff49

The 43.6 score is evaluated on Google's MBPP with 500 problems. Our WizardCoder is also evaluated on the same data.
The 52.7 is evaluated on MultiPL-E's MBPP (397 problems).

Thanks for the clarification here. Very helpful!
@ChiYeungLaw , could you explain a bit more about how did you get the 43.6 for Starcoder? Is that based on the Eval Harness or mbpp_gen.py in your repo? Could you provide the command line for reproduction purposes?

We follow the same prompt as Eval Harness to evaluate StarCoder on MBPP.

Is it prompt = f'"""\n{description}\n{test_example}\n"""\n' or do you include the code_solution?

@ChiYeungLaw May I ask why replacing 4 space chars with a tab before generation?