nlpxucan / WizardLM

LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

remove duplicate

Ski-ing opened this issue · comments

Is there any strict operation to remove duplicate data between training data and test set human-eval before training?

I guess they didn't

We have checked the SFT training set. The HumanEval test set does not leak in it.

I would like to ask if there are any plans to open source the training data?