messense / jieba-rs

The Jieba Chinese Word Segmentation Implemented in Rust

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fix cut_all mixed chinese & english issue

messense opened this issue · comments

The same as the fix of the Python version: fxsjy/jieba@97c3246

@messense : Code mixing is a hard problem, it's about where would you draw the the boundary of Chinese vocabulary. Not only English alphabet could be used in the product names, but Japanese hiragana as well like . I would argue this is beyond the scope a Chinese segmenter, but for sure we can apply the work-around like the one in python implementation for practical reasons.