microsoft / LMOps

General technology for enabling AI capabilities w/ LLMs and MLLMs

Home Page:https://aka.ms/GeneralAI

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[MiniLLM]Why dolly only has 12435 training samples?

yumath opened this issue · comments

but in your paper, Section 3.1

Training
We construct the training data from databricks-dolly-15k consisting of 15K human-
written instruction-response pairs. We randomly split 14K samples as the training set D and left
500 samples for validation and testing, respectively.

Thx very much