There are 8 repositories under chinese-text-segmentation topic.
Jcseg is a light weight NLP framework developed with Java. Provide CJK and English segmentation based on MMSEG algorithm, With also keywords extraction, key sentence extraction, summary extraction implemented based on TEXTRANK algorithm. Jcseg had a build-in http server and search modules for lucene,solr,elasticsearch,opensearch
Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
HanLP中文分词Lucene插件,支持包括Solr在内的基于Lucene的系统
Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词、抽取式文本摘要等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of spee
Tokenizer support Lucene5/6/7/8/9+ version, LTS
利用深度学习实现中文分词
Mandarin Chinese text segmentation and mobile dictionary Android app (中文分词)
Chinese Word Segmention Base on the Deep Learning and LSTM Neural Network
ChatterBot中文适配版,支持中文分词搜索和中文停用词
為了《中國哲學書電子化計劃》輸入用
基于 jieba-rs 的中文分词插件
Postgresql with zhparser
Using Flask export jieba, SnowNLP, pkuseg as http API web service.
Wrapper for BosonNLP online API
A copy-cat implementation of jieba as a learning exercise.
Python code for text mining test
一些容易被分词工具被分错的句子。