OpenMOSS / MOSS

An open-source tool-augmented conversational language model from Fudan University

Home Page:https://txsun1997.github.io/blogs/moss.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to train a custom tokenizer for Chinese from scratch

SparkJiao opened this issue · comments

Hi, wonderful work!

May I know how to train a custom tokenizer for Chinese from scratch? Is there any public reference or code can share?

Thanks for your help very much!

best,
Fangkai