How to prepare the training data

Question

How to prepare the training data

ycsun1972 opened this issue 10 months ago · comments

Hi,
"We fine-tune the 7B and 13B models with 80k and 18k conversations, respectively."
Could you provide more details about the training data? How the 80k data are prepared? Are they all with length of 16k?

Is the data used for training longchat-v1.5 the same as previous version?

Mooler0410 · Answer 1 · Sat Dec 09 2023 01:00:07 GMT+0800 (China Standard Time)

Same question about longchat-v1.5. Cannot find any details about the longchat-v1.5.

Dacheng Li · Answer 2 · Sat Dec 09 2023 03:00:48 GMT+0800 (China Standard Time)

@Mooler0410 oh it is the same, we just use the same data, but based on llama2