hpcaitech / ColossalAI-Examples

Examples of training models with hybrid parallelism using ColossalAI

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Personal Dataset Preprocessing

Lobskodax opened this issue · comments

If I want to use my own dataset to train the gpt-2 model, the format is TXT, with one sentence per line, how can I modify the data preprocessing code to make it match and run normally.

Hi, did you figure out how?

Hi, I will close this issue for now. If you have difficulty build your own dataset, welcome to re-open this issue. Thanks~