There are 34 repositories under chinese-word-segmentation topic.
100+ Chinese Word Vectors 上百种预训练中文词向量
pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation
Datasets, SOTA results of every fields of Chinese NLP
Jcseg is a light weight NLP framework developed with Java. Provide CJK and English segmentation based on MMSEG algorithm, With also keywords extraction, key sentence extraction, summary extraction implemented based on TEXTRANK algorithm. Jcseg had a build-in http server and search modules for lucene,solr,elasticsearch,opensearch
Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
High performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other programs, like: MySQL, PostgreSQL, PHP, etc.
MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型
基于深度学习的自然语言处理库
一个微型&算法全面的中文分词引擎 | A micro tokenizer for Chinese
Some experiments about Machine Learning
手工整理医疗行业词汇、术语等语料。可用于语音识别、对话系统等各类nlp模型训练。
Source codes for paper "Neural Networks Incorporating Dictionaries for Chinese Word Segmentation", AAAI 2018
Lucene/Solr Analyzer Plugin. Support MacOS,Linux x86/64,Windows x86/64. It's a maven project, which allows you change the lucene/solr version. //Maven工程,修改Lucene/Solr版本,以兼容相应版本。
利用深度学习实现中文分词
基于深度学习的自然语言处理库
A convenient Chinese word segmentation tool 简便中文分词器
Dictionary for Cantonese word segmentation
Sub-Character Representation Learning
The Jieba Chinese Word Segmentation Implemented in PHP
Multiple Character Embeddings for Chinese Word Segmentation, ACL 2019
ChatterBot中文适配版,支持中文分词搜索和中文停用词
Code for Unsupervised multi-granular Chinese word segmentation and term discovery via graph partition [JBI]