How to evaluate on Human-eval
Lifeasarain opened this issue · comments
I would like to evaluate the Codegen model on human-eval dataset. But I don't know how to generate 200 samples for each problem to calculate pass@k. Can you provide any documentation in this regard?