CHN-ChenYi / ChineseTokenizers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ChineseTokenizers

A tokenizer based on Tokenizers.

Additional features

  • Jieba Pre-tokenizer
  • ChineseWordPiece Model (based on Yuan-1.0)

Examples

Yuan Preprocessor

RAYON_NUM_THREADS=48 TOKENIZERS_PARALLELISM=1 cargo run --release --example yuan

About

License:Apache License 2.0


Languages

Language:Rust 69.6%Language:Python 19.2%Language:TypeScript 4.9%Language:Jupyter Notebook 4.8%Language:JavaScript 0.9%Language:CSS 0.3%Language:Makefile 0.3%Language:Shell 0.1%