chinese-text-segmentation

There are 8 repositories under chinese-text-segmentation topic.

wolfgarbe / SymSpell
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
approximate-string-matching chinese-text-segmentation chinese-word-segmentation damerau-levenshtein edit-distance fuzzy-matching fuzzy-search levenshtein levenshtein-distance spell-check spellcheck spelling spelling-correction symspell text-segmentation word-segmentation
Language:C# 3034
koth / kcws
Deep Learning Chinese Word Segment
nlp deep-learning chinese-text-segmentation tensorflow pos-tagger
Language:C++ 2080
fukuball / jieba-php
"結巴"中文分詞：做最好的 PHP 中文分詞、中文斷詞組件。 / "Jieba" (Chinese for "to stutter") Chinese text segmentation: built to be the best PHP Chinese word segmentation module.
nlp natural-language-processing chinese-text-segmentation machine-learning
Language:PHP 1302
lionsoul2014 / jcseg
Jcseg is a light weight NLP framework developed with Java. Provide CJK and English segmentation based on MMSEG algorithm, With also keywords extraction, key sentence extraction, summary extraction implemented based on TEXTRANK algorithm. Jcseg had a build-in http server and search modules for lucene,solr,elasticsearch,opensearch
java jcseg mmseg chinese-word-segmentation natural-language-processing pos-tagging nlp nlp-keywords-extraction lucene-analyzer lucene-tokenizer solr-plugin elasticsearch-analyzer chinese-text-segmentation chinese-nlp keywords-extraction jcseg-analyzer opensearch-analyzer opensearch-tokenizer elasticsearch-tokenizer
Language:Java 905
mammothb / symspellpy
Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
python spellcheck spell-check fuzzy-matching fuzzy-search spelling-correction damerau-levenshtein approximate-string-matching levenshtein edit-distance levenshtein-distance spelling word-segmentation chinese-text-segmentation chinese-word-segmentation text-segmentation symspell
Language:Python 765
amutu / zhparser
zhparser is a PostgreSQL extension for full-text search of Chinese language
chinese chinese-nlp chinese-text-segmentation extension postgresql scws zhparser
Language:C 653
qinwf / jiebaR
Chinese text segmentation with R. R语言中文分词（文档已更新 🎉 ：https://qinwenfeng.com/jiebaR/ )
chinese chinese-text-segmentation cppjieba jieba lexical-analysis nlp
Language:C++ 338
hankcs / hanlp-lucene-plugin
HanLP中文分词Lucene插件，支持包括Solr在内的基于Lucene的系统
hanlp lucene solr nlp chinese-text-segmentation traditional-chinese
Language:Java 294
yongzhuo / Pytorch-NLU
Pytorch-NLU，一个中文文本分类、序列标注工具包，支持中文长文本、短文本的多类、多标签分类任务，支持中文命名实体识别、词性标注、分词、抽取式文本摘要等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of spee
python3 pytorch text-classification sequence-labeling named-entity-recognition word-segmentation pos-tagging chinese-text-segmentation chinese-text-classification transformers bert pretrained-models
Language:Python 289
blueshen / ik-analyzer
Tokenizer support Lucene5/6/7/8/9+ version, LTS
lucene ik-analyzer search-engine chinese-text-segmentation solr solrcloud elasticsearch java lucene9
Language:Java 194
supercoderhawk / DNN_CWS
利用深度学习实现中文分词
chinese-text-segmentation chinese-word-segmentation deep-learning tensorflow
Language:Python 58
yingrui / mahjong
开源中文分词工具包，中文分词Web API，Lucene中文分词，中英文混合分词
scala deep-learning chinese-text-segmentation pinyin crf hmm viterbi dijkstra go
Language:Scala 42
ReubenBond / HanBaoBao
Mandarin Chinese text segmentation and mobile dictionary Android app (中文分词)
text-segmentation transliteration android pinyin dictionary-data chinese-text-segmentation chinese
Language:Java 29
qiaofei32 / dnn-lstm-word-segment
Chinese Word Segmention Base on the Deep Learning and LSTM Neural Network
dnn word2vec word-segmentation chinese-text-segmentation keras lstm
Language:Python 23
blueshen / ik-rs
ik-analyzer for rust; chinese tokenizer for tantivy
chinese segmentation tantivy chinese-text-segmentation ik-analyzer
Language:Rust 14
fg607 / ChatterBot
ChatterBot中文适配版，支持中文分词搜索和中文停用词
chatterbot chatbot chinese-word-segmentation chinese-text-segmentation chinese-language chinese-stop-words
Language:Python 14
fumiama / jieba
Jiebago 的性能优化版, 支持从 io.Reader 加载字典
chinese chinese-characters chinese-language chinese-text-segmentation chinese-word-segmentation golang golang-library golang-package jieba jieba-analysis jieba-chinese
Language:Go 13
jason2506 / esapp
An unsupervised Chinese word segmentation tool.
computational-linguistics chinese-nlp chinese-text-segmentation word-segmentation unsupervised-learning nlp
Language:C++ 13
oscarsun72 / TextForCtext
為了《中國哲學書電子化計劃》輸入用
characters chinese ctext chinese-characters chinese-language chinese-text-segmentation chinese-traditional chinese-word-segmentation chrome chromedriver selenium selenium-webdriver text text-content text-editor ocr sinology
Language:C# 10
stephanoskomnenos / vscode-jieba
基于 jieba-rs 的中文分词插件
chinese-text-segmentation vscode-extension vscode chinese-language
Language:TypeScript 10
ChiChou / zhparser-docker
Postgresql with zhparser
postgresql chinese-text-segmentation docker docker-image
9
Colearo / HuhuSeg
Simple Chinese segmentator, keywords extractor and other examples
segmentation chinese-text-segmentation mmseg keywords-extraction extraction
Language:Python 8
wycm / xuexin-ocr
学信网学籍&学历图片内容识别
ocr xuexin-ocr chinese-text-segmentation chinese-ocr
Language:Python 7
hshrimp / HMM_Chinese_seg
HMM 隐马尔可夫中文分词
hmm chinese-nlp chinese-text-segmentation
Language:Python 6
numb3r3 / text_utils
Text Pre-processing toolkit
nlp tokeniz text-processing chinese-text-segmentation
Language:Python 6
zhangsoledad / solr-ik
solr-ik
solr java tokenizer chinese chinese-text-segmentation
Language:Java 6
jk195417 / chinese-segmentation-as-service
Using Flask export jieba, SnowNLP, pkuseg as http API web service.
chinese-text-segmentation chinese-word-segmentation flask jieba pkuseg snownlp
Language:Python 4
CedPane
ssb22 / CedPane
Chinese-English Dictionary Public-domain Additions for Names Etc (CedPane)
cantonese-language chinese-text-segmentation dictionary mandarin-chinese romanization speech-synthesis
4
deminy / jieba-php
"结巴中文分词"PHP版本
nlp natural-language-processing chinese-text-segmentation machine-learning
Language:PHP 2
FlyingOE / q_BosonNLP
Wrapper for BosonNLP online API
bosonnlp sdk kdb-library kdb q-language text-processing chinese-text-segmentation chinese-nlp nlp-machine-learning
2
adjuster
ssb22 / adjuster
Web Adjuster + Annotator Generator
chinese-text-segmentation mediator proxy transcoding webdriver low-vision romanization web-accessibility
Language:Python 1
ericlingit / jieba-go
A copy-cat implementation of jieba as a learning exercise.
chinese-text-segmentation hmm-viterbi-algorithm
Language:Go 0
smart-lands-com / smla-cut
Chinese text segmentation
chinese-text-segmentation
Language:Python 0
davidlorente78 / Recogzi
machine-learning chinese-simplified chinese-text-segmentation net
Language:F#
JherezTaylor / f360-textmining-test
Python code for text mining test
chinese-nlp chinese-text-segmentation python text-mining
Language:Jupyter Notebook
secsilm / text-segmentation-trap
一些容易被分词工具被分错的句子。
chinese-nlp chinese-text-segmentation chinese-word-segmentation natural-language-processing segmentation text-analysis text-segmentation
Language:Jupyter Notebook