How to evaluate on Human-eval

Question

How to evaluate on Human-eval

Lifeasarain opened this issue a year ago · comments

I would like to evaluate the Codegen model on human-eval dataset. But I don't know how to generate 200 samples for each problem to calculate pass@k. Can you provide any documentation in this regard?