[MiniLLM]Why dolly only has 12435 training samples?

Question

[MiniLLM]Why dolly only has 12435 training samples?

yumath opened this issue 4 months ago · comments

but in your paper, Section 3.1

Training
We construct the training data from databricks-dolly-15k consisting of 15K human-
written instruction-response pairs. We randomly split 14K samples as the training set D and left
500 samples for validation and testing, respectively.

Yuxian Gu commented 4 months ago

See #167

南雍山野猪骑士 · Answer 1 · Mon Feb 26 2024 20:55:37 GMT+0800 (China Standard Time)

Thx very much