word-segmentation

There are 9 repositories under word-segmentation topic.

google / sentencepiece
Unsupervised text tokenizer for Neural Network-based text generation.
natural-language-processing neural-machine-translation word-segmentation
Language:C++ 9866
baidu / lac
百度NLP：分词，词性标注，命名实体识别，词重要性
word-segmentation part-of-speech-tagger named-entity-recognition chinese-word-segmentation chinese-nlp lexical-analysis python java
Language:C++ 3813
wolfgarbe / SymSpell
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
approximate-string-matching chinese-text-segmentation chinese-word-segmentation damerau-levenshtein edit-distance fuzzy-matching fuzzy-search levenshtein levenshtein-distance spell-check spellcheck spelling spelling-correction symspell text-segmentation word-segmentation
Language:C# 3086
VKCOM / YouTokenToMe
Unsupervised text tokenizer focused on computational efficiency
bpe natural-language-processing nlp tokenization word-segmentation
Language:C++ 950
PyThaiNLP / pythainlp
Thai Natural Language Processing in Python.
python thai-nlp nlp-library thai-language natural-language-processing thai-nlp-library thai-soundex soundex word-segmentation thai hacktoberfest hacktoberfest-accepted
Language:Python 946
mammothb / symspellpy
Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
approximate-string-matching chinese-text-segmentation chinese-word-segmentation damerau-levenshtein edit-distance fuzzy-matching fuzzy-search levenshtein levenshtein-distance python spell-check spellcheck spelling spelling-correction symspell text-segmentation word-segmentation
Language:Python 778
ckiplab / ckip-transformers
CKIP Transformers
ckip language-model named-entity-recognition part-of-speech-tagging transformers word-segmentation
Language:Python 662
cbaziotis / ekphrasis
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
nlp nlp-library semeval spell-corrector spelling-correction text-processing text-segmentation tokenization tokenizer word-normalization word-segmentation
Language:Python 660
vncorenlp / VnCoreNLP
A Vietnamese natural language processing toolkit (NAACL 2018)
dependency-parsing named-entity-recognition pos-tagging word-segmentation vietnamese-tokenizer vietnamese-nlp natural-language-processing nlp sentence-segmentation vncorenlp word-segmenter pos-tagger ner parsing vietnamese rdrsegmenter vnmarmot java python3
Language:Java 570
JayYip / m3tl
BERT for Multitask Learning
bert named-entity-recognition nlp word-segmentation multitask-learning part-of-speech cws pretrained-models ner text-classification multi-task-learning transformer encoder-decoder
Language:Jupyter Notebook 545
Kiwi
bab2min / Kiwi
Kiwi(지능형 한국어 형태소 분석기)
cpp korean korean-nlp korean-text-processing korean-tokenizer morphological-analysis morphology nlp word-segmentation
Language:C++ 401
modelscope / AdaSeq
AdaSeq: An All-in-One Library for Developing State-of-the-Art Sequence Understanding Models
entity-typing named-entity-recognition natural-language-processing natural-language-understanding nlp pytorch sequence-labeling word-segmentation ner relation-extraction bert chinese-nlp crf information-extraction multi-modal-ner token-classification
Language:Python 388
nagisa
taishi-i / nagisa
A Japanese tokenizer based on recurrent neural networks
dynet japanese natural-language-processing nlp nlp-library pos-tagging sequence-labeling tokenizer word-segmentation
Language:Python 377
ku-nlp / jumanpp
Juman++ (a Morphological Analyzer Toolkit)
cjk japanese juman morphological-analyser morphological-analysis nlp part-of-speech-tagger pos-tagger pos-tagging tokenizer word-segmentation
Language:C++ 372
jacksonllee / pycantonese
Cantonese Linguistics and NLP
pycantonese cantonese computational-linguistics natural-language-processing nlp linguistics python jyutping stop-words word-segmentation part-of-speech-tagging
Language:Python 340
yongzhuo / Pytorch-NLU
Pytorch-NLU，一个中文文本分类、序列标注工具包，支持中文长文本、短文本的多类、多标签分类任务，支持中文命名实体识别、词性标注、分词、抽取式文本摘要等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of spee
bert chinese-text-classification chinese-text-segmentation named-entity-recognition pos-tagging pretrained-models python3 pytorch sequence-labeling text-classification transformers word-segmentation
Language:Python 312
kiwipiepy
bab2min / kiwipiepy
Python API for Kiwi
korean korean-nlp korean-tokenizer morphological-analysis nlp python-library word-segmentation
Language:Python 254
monpa
monpa-team / monpa
MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型
nlp ner pos named-entity-recognition word-segmentation chinese-word-segmentation pos-tagging bert albert
Language:Python 245
jidasheng / bi-lstm-crf
A PyTorch implementation of the BI-LSTM-CRF model.
crf bilstm bilstm-crf crf-model bi-lstm-crf lstm-crf pytorch nlp ner sequence-labeling sequence-tagging word-segmentation pos-tagging
Language:Python 238
ikegami-yukino / mecab
This repository is for building Windows 64-bit MeCab binary and improving MeCab Python binding.
mecab morphological-analysis nlp-library pos-tagging word-segmentation
Language:C++ 232
fastcws / fastcws
轻量级高性能中文分词项目
chinese frequency-dictionary hidden-markov-model nlp-chinese word-break word-segment word-segmentation word-segmenter wordbreak wordseg wordsegmentation
Language:C++ 195
toiro
taishi-i / toiro
A comparison tool of Japanese tokenizers
bert japanese natural-language-processing nlp nlp-library word-segmentation
Language:Python 115
ckiplab / ckipnlp
CKIP CoreNLP Toolkits
ckip nlp named-entity-recognition coreference-resolution sentence-segmentation word-segmentation part-of-speech-tagging sentence-parsing
Language:Python 114
peterolson / hanzi-tools
Converts from Chinese characters to pinyin, between simplified and traditional, and does word segmentation.
chinese-characters pinyin simplified-chinese traditional-chinese word-segmentation
Language:JavaScript 100
Ailln / nlp-roadmap
🗺️ 一个自然语言处理的学习路线图
natural-language-processing nlp roadmap sequence-labeling word-embedding word-segmentation
92
fudannlp16 / CWS_Dict
Source codes for paper "Neural Networks Incorporating Dictionaries for Chinese Word Segmentation", AAAI 2018
cws chinese-word-segmentation deep-learning word-segmentation tensorflow
Language:Python 91
jcyk / CWS
Source code for an ACL2016 paper of Chinese word segmentation
dynet cws chinese word-segmentation acl segmentation
Language:Python 80
wolfgarbe / WordSegmentationTM
Fast Word Segmentation with Triangular Matrix
spell-check spell-checker spell-corrector spellcheck spellchecker spelling-checker spelling-correction spelling-corrector symspell text-segmentation word-segmentation
Language:C# 77
datquocnguyen / RDRsegmenter
A Fast and Accurate Vietnamese Word Segmenter (LREC 2018)
word-segmentation vietnamese vietnamese-tokenizer vietnamese-nlp
Language:Java 74
phongnt570 / UETsegmenter
A toolkit for Vietnamese word segmentation
natural-language-processing word-segmentation vietnamese java
Language:Java 68
MighTguY / customized-symspell
Java port of SymSpell: 1 million times faster through Symmetric Delete spelling correction algorithm
symspell spellchecker java-8 spelling-correction qwerty-based-char-distance word-segmentation damerau-levenshtein levenshtein-distance weighted-damerau-levenshtein
Language:Java 63
hashformers
ruanchaves / hashformers
Hashformers is a framework for hashtag segmentation with Transformers and Large Language Models (LLMs).
hashtag-segmentor word-segmentation transformers transformers-gpt2 deep-learning nlp twitter tweet-analysis tweets-classification sentiment-analysis sentiment-classification sentiment-polarity twitter-sentiment-analysis natural-language-processing bert transformer segmentation paper large-language-models llms
Language:Python 63
ye-kyaw-thu / sylbreak
Syllable segmentation tool for Myanmar language (Burmese) by Ye.
burmese myanmar regular-expressions syllable word-segmentation
Language:HTML 55
dnanhkhoa / python-vncorenlp
A Python wrapper for VnCoreNLP using a bidirectional communication channel.
vncorenlp python-vncorenlp nlp vietnamese-nlp parser tokenizer postagger named-entity-recognition ner dependency-parser pos-tagger word-segmentation
Language:Python 54
undertheseanlp / word_tokenize
Vietnamese Word Tokenize
natural-language-processing nlp vietnamese vietnamese-nlp word-segmentation
Language:Python 47
giganticode / codeprep
A toolkit for pre-processing large source code corpora
mining-software-repositories source-code-analysis language-modeling word-segmentation natural-language-processing
Language:Python 45