train.txt format
AAnirudh07 opened this issue · comments
Hey @chrisociepa, awesome repo! Could you pls shed some light on how the train.txt
file should look like?
Thank you!
No personal experience, but It's likely whatever kind of data you want to train/fine-tune on. It's up to you what kind of format exactly it will be in, but keep in mind that the data will get tokenized.
In nanoGPT by Andrej Karpathy, which is a major inspiration for this repo, by the looks of it, Shakespeare is used as a toy example:
https://github.com/karpathy/nanoGPT/blob/0d8fbd11aed59617f65d2bbd14842b4050516128/data/shakespeare/prepare.py#L9
This was super helpful, thank you!