There are 29 repositories under chinese-word-segmentation topic.
100+ Chinese Word Vectors 上百种预训练中文词向量
pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation
Jiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
Datasets, SOTA results of every fields of Chinese NLP
Jcseg is a light weight NLP framework developed with Java. Provide CJK and English segmentation based on MMSEG algorithm, With also keywords extraction, key sentence extraction, summary extraction implemented based on TEXTRANK algorithm. Jcseg had a build-in http server and search modules for lucene,solr,elasticsearch,opensearch
Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
The Jieba Chinese Word Segmentation Implemented in Rust
High performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other programs, like: MySQL, PostgreSQL, PHP, etc.
g2pC: A Context-aware Grapheme-to-Phoneme Conversion module for Chinese
一个微型&算法全面的中文分词引擎 | A micro tokenizer for Chinese
A PyTorch implementation of a BiLSTM \ BERT \ Roberta (+ BiLSTM + CRF) model for Chinese Word Segmentation (中文分词) .
Some experiments about Machine Learning
Source codes for paper "Neural Networks Incorporating Dictionaries for Chinese Word Segmentation", AAAI 2018
Source code for an ACL2017 paper on Chinese word segmentation
Lucene/Solr Analyzer Plugin. Support MacOS,Linux x86/64,Windows x86/64. It's a maven project, which allows you change the lucene/solr version. //Maven工程，修改Lucene/Solr版本，以兼容相应版本。
Open Source State-of-the-art Chinese Word Segmentation System with BiLSTM and ELMo. https://arxiv.org/abs/1901.05816
Sub-Character Representation Learning
Dictionary for Cantonese word segmentation
A convenient Chinese word segmentation tool 简便中文分词器
Berserker - BERt chineSE woRd toKenizER
Multiple Character Embeddings for Chinese Word Segmentation, ACL 2019
Code for IJCAI 2018 paper "Neural Networks Incorporating Unlabeled and Partially-labeled Data for Cross-domain Chinese Word Segmentation"
The Jieba Chinese Word Segmentation Implemented in PHP
Code for Unsupervised multi-granular Chinese word segmentation and term discovery via graph partition [JBI]
Python cffi binding to CppJieba
Chinese Word Segmentation task based on BERT and implemented in Pytorch