qingkongzhiqian / GPT2-Summary

基于GPT2的中文摘要生成模型

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

train_with_summary.txt train_tokenized.txt

world4jason opened this issue · comments

有沒有這兩個的連結?
參考一下格式
謝謝

commented

Maybe you can refer: https://zhuanlan.zhihu.com/p/113869509

The data format is similar to the following example:
{"summarization": "xxxxxxxxx", "article": "aaaaaaaaa"}

You can use json.dumps() to convert data to string data type and save it, using '\n' to split data. (Because the source code is using json.loads() to load the training data)