remove duplicate

Question

Ski-ing opened this issue a year ago · comments

Is there any strict operation to remove duplicate data between training data and test set human-eval before training?

PoseidomWong · Answer 1 · Fri Aug 11 2023 15:53:16 GMT+0800 (China Standard Time)

I guess they didn't

ChiYeung Law · Answer 2 · Sat Aug 12 2023 11:10:07 GMT+0800 (China Standard Time)

We have checked the SFT training set. The HumanEval test set does not leak in it.

PoseidomWong · Answer 3 · Mon Aug 14 2023 16:05:30 GMT+0800 (China Standard Time)

I would like to ask if there are any plans to open source the training data?