chinese-word-segmentation

There are 35 repositories under chinese-word-segmentation topic.

Embedding / Chinese-Word-Vectors
100+ Chinese Word Vectors 上百种预训练中文词向量
chinese chinese-word-segmentation embedding embeddings vectors-trained word-embeddings
Language:Python 11625
lancopku / pkuseg-python
pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation
chinese-word-segmentation
Language:Python 6437
baidu / lac
百度NLP：分词，词性标注，命名实体识别，词重要性
word-segmentation part-of-speech-tagger named-entity-recognition chinese-word-segmentation chinese-nlp lexical-analysis python java
Language:C++ 3769
ownthink / Jiagu
Jiagu深度学习自然语言处理工具知识图谱关系抽取中文分词词性标注命名实体识别情感分析新词发现关键词文本摘要文本聚类
nlp cws pos ner chinese-word-segmentation
Language:Python 3221
hankcs / pyhanlp
中文分词
hanlp natural-language-processing chinese-word-segmentation part-of-speech-tagger named-entity-recognition dependency-parser
Language:Python 3082
wolfgarbe / SymSpell
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
levenshtein fuzzy-search approximate-string-matching edit-distance spellcheck spell-check levenshtein-distance damerau-levenshtein spelling fuzzy-matching word-segmentation chinese-text-segmentation chinese-word-segmentation text-segmentation spelling-correction symspell
Language:C# 3051
didi / ChineseNLP
Datasets, SOTA results of every fields of Chinese NLP
nlp chinese-nlp machine-translation chinese-word-segmentation entity-linking question-answering nlp-tasks
Language:HTML 1775
lionsoul2014 / jcseg
Jcseg is a light weight NLP framework developed with Java. Provide CJK and English segmentation based on MMSEG algorithm, With also keywords extraction, key sentence extraction, summary extraction implemented based on TEXTRANK algorithm. Jcseg had a build-in http server and search modules for lucene,solr,elasticsearch,opensearch
java jcseg mmseg chinese-word-segmentation natural-language-processing pos-tagging nlp nlp-keywords-extraction lucene-analyzer lucene-tokenizer solr-plugin elasticsearch-analyzer chinese-text-segmentation chinese-nlp keywords-extraction jcseg-analyzer opensearch-analyzer opensearch-tokenizer elasticsearch-tokenizer
Language:Java 906
mammothb / symspellpy
Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
python spellcheck spell-check fuzzy-matching fuzzy-search spelling-correction damerau-levenshtein approximate-string-matching levenshtein edit-distance levenshtein-distance spelling word-segmentation chinese-text-segmentation chinese-word-segmentation text-segmentation symspell
Language:Python 771
messense / jieba-rs
The Jieba Chinese Word Segmentation Implemented in Rust
chinese-word-segmentation jieba-chinese jieba nlp wasm
Language:Rust 695
lionsoul2014 / friso
High performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other programs, like: MySQL, PostgreSQL, PHP, etc.
c tokenizer chinese-tokenizer php-tokenizer full-text-search chinese-word-segmentation japanese-tokenizer korean-tokenizer cjk-tokenizer
Language:C 472
monpa
monpa-team / monpa
MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型
nlp ner pos named-entity-recognition word-segmentation chinese-word-segmentation pos-tagging bert albert
Language:Python 244
g2pC
Kyubyong / g2pC
g2pC: A Context-aware Grapheme-to-Phoneme Conversion module for Chinese
g2p pinyin crf crfsuite chinese-nlp chinese-word-segmentation
Language:Python 231
hemingkx / WordSeg
A PyTorch implementation of a BiLSTM \ BERT \ Roberta (+ BiLSTM + CRF) model for Chinese Word Segmentation (中文分词) .
bert pytorch roberta chinese-word-segmentation bilstm-crf bert-crf
Language:Python 191
supercoderhawk / DeepLearning_NLP
基于深度学习的自然语言处理库
deep-learning natural-language-processing chinese-word-segmentation relation-extraction named-entity-recognition tensorflow chinese-tokenizer
Language:Python 153
howl-anderson / MicroTokenizer
一个微型&算法全面的中文分词引擎 | A micro tokenizer for Chinese
chinese-nlp tokenizer chinese-tokenizer dag-network nlp-machine-learning chinese-word-segmentation
Language:Python 143
llhthinker / MachineLearningLab
Some experiments about Machine Learning
machine-learning frequent-itemset-mining classification regression-models gradient-descent chinese-word-segmentation
Language:Python 113
xtea / chinese_medical_words
手工整理医疗行业词汇、术语等语料。可用于语音识别、对话系统等各类nlp模型训练。
nlp chinese-nlp nlp-datasets medical nlp-data-to-text chinese-word-segmentation
100
fudannlp16 / CWS_Dict
Source codes for paper "Neural Networks Incorporating Dictionaries for Chinese Word Segmentation", AAAI 2018
cws chinese-word-segmentation deep-learning word-segmentation tensorflow
Language:Python 91
jcyk / greedyCWS
Source code for an ACL2017 paper on Chinese word segmentation
acl chinese-word-segmentation dynet cws
Language:Python 90
yizhiru / thulac4j
Chinese Word Segmentation Tool, THULAC的Java实现.
chinese-word-segmentation thulac
Language:Java 85
NLPIR-team / nlpir-analysis-cn-ictclas
Lucene/Solr Analyzer Plugin. Support MacOS,Linux x86/64,Windows x86/64. It's a maven project, which allows you change the lucene/solr version. //Maven工程，修改Lucene/Solr版本，以兼容相应版本。
lucene lucene-analyzer nlpir solr chinese-word-segmentation ictclas
Language:Java 72
supercoderhawk / DNN_CWS
利用深度学习实现中文分词
chinese-word-segmentation deep-learning tensorflow chinese-text-segmentation
Language:Python 58
voidism / pywordseg
Open Source State-of-the-art Chinese Word Segmentation System with BiLSTM and ELMo. https://arxiv.org/abs/1901.05816
chinese-word-segmentation character-level-elmo segmentation natural-language-processing
Language:Python 39
supercoderhawk / DeepNLP
基于深度学习的自然语言处理库
tensorflow natural-language-processing deep-learning chinese-word-segmentation named-entity-recognition
Language:Python 37
dongrixinyu / jiojio
A convenient Chinese word segmentation tool 简便中文分词器
chinese-nlp chinese-word-segmentation crf partofspeech-tagger python wordsegmentation
Language:Python 35
wchan757 / Cantonese_Word_Segmentation
Dictionary for Cantonese word segmentation
cantonese cantonese-dictionary cantonese-language chinese-word-segmentation nlp word-segmentation
33
hankcs / sub-character-cws
Sub-Character Representation Learning
representation-learning chinese-word-segmentation cws simplified-chinese traditional-chinese natural-language-processing nlp
Language:Python 25
binaryoung / jieba-php
The Jieba Chinese Word Segmentation Implemented in PHP
jieba chinese-word-segmentation ffi
Language:PHP 19
bububa / jiagu
Jiagu深度学习自然语言处理工具知识图谱关系抽取中文分词词性标注命名实体识别情感分析新词发现关键词文本摘要文本聚类
nlp ner cws pos chinese-word-segmentation chinese-nlp segmentation clustering classification
Language:Go 19
Hoiy / berserker
Berserker - BERt chineSE woRd toKenizER
bert sequence-to-sequence tokenizer chinese-nlp nlp tensorflow tpu chinese-word-segmentation bert-chinese state-of-the-art
Language:Python 17
NLPIR-team / NLPIR-ICTCLAS
The Java Package of NLPIR-ICTCLAS.
nlpir chinese-word-segmentation ictclas
Language:Java 17
wangjksjtu / multi-embedding-cws
Multiple Character Embeddings for Chinese Word Segmentation, ACL 2019
chinese-word-segmentation embeddings pinyin wubi radical
Language:Python 16
messense / cjieba-py
Python cffi binding to CppJieba
cffi python-bindings chinese-word-segmentation word-segmentation jieba jieba-chinese
Language:Python 15
fg607 / ChatterBot
ChatterBot中文适配版，支持中文分词搜索和中文停用词
chatterbot chatbot chinese-word-segmentation chinese-text-segmentation chinese-language chinese-stop-words
Language:Python 14
GanjinZero / GTS
Code for Unsupervised multi-granular Chinese word segmentation and term discovery via graph partition [JBI]
chinese-word-segmentation graph-cut unsupervised
Language:Python 14

chinese-word-segmentation

Embedding / Chinese-Word-Vectors

lancopku / pkuseg-python

baidu / lac

ownthink / Jiagu

hankcs / pyhanlp

wolfgarbe / SymSpell

didi / ChineseNLP

lionsoul2014 / jcseg

mammothb / symspellpy

messense / jieba-rs

lionsoul2014 / friso

monpa-team / monpa

Kyubyong / g2pC

hemingkx / WordSeg

supercoderhawk / DeepLearning_NLP

howl-anderson / MicroTokenizer

llhthinker / MachineLearningLab

xtea / chinese_medical_words

fudannlp16 / CWS_Dict

jcyk / greedyCWS

yizhiru / thulac4j

NLPIR-team / nlpir-analysis-cn-ictclas

supercoderhawk / DNN_CWS

voidism / pywordseg

supercoderhawk / DeepNLP

dongrixinyu / jiojio

wchan757 / Cantonese_Word_Segmentation

hankcs / sub-character-cws

binaryoung / jieba-php

bububa / jiagu

Hoiy / berserker

NLPIR-team / NLPIR-ICTCLAS

wangjksjtu / multi-embedding-cws

messense / cjieba-py

fg607 / ChatterBot

GanjinZero / GTS