text-segmentation

There are 7 repositories under text-segmentation topic.

catalyst-team / catalyst
Accelerated deep learning R&D
deep-learning reinforcement-learning machine-learning computer-vision pytorch python distributed-computing infrastructure research reproducibility image-processing image-classification image-segmentation object-detection natural-language-processing text-classification text-segmentation information-retrieval recommender-system metric-learning
Language:Python 3363
wolfgarbe / SymSpell
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
approximate-string-matching chinese-text-segmentation chinese-word-segmentation damerau-levenshtein edit-distance fuzzy-matching fuzzy-search levenshtein levenshtein-distance spell-check spellcheck spelling spelling-correction symspell text-segmentation word-segmentation
Language:C# 3327
blmoistawinde / HarvestText
文本挖掘和预处理工具（文本清洗、新词发现、情感分析、实体识别链接、关键词抽取、知识抽取、句法分析等），无监督或弱监督方法
nlp sentiment-analysis new-word-discovery unsupervised text-summarization named-entity-recognition dependency-parser text-segmentation text-cleaning pyhanlp harvesttext keyword-extraction gitee
Language:Python 2572
ogkalu2 / comic-translate
Desktop app for automatically translating comics - BDs, Manga, Manhwa, Fumetti and more in a variety of formats (Image, Pdf, Epub, cbr, cbz, etc) and in multiple languages.
comics computer-vision deep-learning gui machine-translation manga manhwa neural-network ocr python pytorch text-detection translation webtoons inpainting anime manhua segmentation text-segmentation pyside6
Language:Python 2095
mammothb / symspellpy
Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
approximate-string-matching chinese-text-segmentation chinese-word-segmentation damerau-levenshtein edit-distance fuzzy-matching fuzzy-search levenshtein levenshtein-distance python spell-check spellcheck spelling spelling-correction symspell text-segmentation word-segmentation
Language:Python 848
cbaziotis / ekphrasis
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
nlp nlp-library semeval spell-corrector spelling-correction text-processing text-segmentation tokenization tokenizer word-normalization word-segmentation
Language:Python 672
ZumingHuang / awesome-ocr-resources
A collection of resources (including the papers and datasets) of OCR (Optical Character Recognition).
ocr text-detection text-recognition text-segmentation end-to-end-ocr machine-learning deep-learning computer-vision video-ocr awesome
427
notAI-tech / deepsegment
A sentence segmenter that actually works!
nlp segmentation text text-segmentation deep-learning sentence-segmenter punctuation
Language:Python 304
koomri / text-segmentation
Implementation of the paper: Text Segmentation as a Supervised Learning Task
dataset deep-learning machine-learning neural-network nlp text-segmentation
Language:Python 264
sedflix / awesome-topic-segmentation
(yet another not really) awesome topic/text segmentation list
topic-segmentation text-segmentation coherence papers awesome-list
109
wolfgarbe / WordSegmentationTM
Fast Word Segmentation with Triangular Matrix
word-segmentation text-segmentation symspell spelling-correction spelling-corrector spellcheck spellchecker spell-check spell-checker spell-corrector spelling-checker
Language:C# 83
DCY1117 / MangaQuick
Automatic Manga Translator
image-inpainting manga manga-translator ocr text-segmentation
Language:Jupyter Notebook 64
google / emoji-segmenter
Emoji Segmenter
text-segmentation emoji unicode fonts
Language:C 64
viig99 / SymSpellCppPy
Fast SymSpell written in c++ and exposes to python via pybind11
symspell python pybind11 spelling-correction spelling-corrector spellcheck spell-check fuzzy-matching fuzzy-search spelling word-segmentation compound-words text-segmentation
Language:C++ 44
Jumpst3r / printed-hw-segmentation
Printed and handwritten text segmentation using fully convolutional networks and CRF post-processing
segmentation fully-convolutional-networks conditional-random-fields machine-learning text-segmentation printed-handwritten-text
Language:Python 40
ReemHal / Semantic-Text-Segmentation-with-Embeddings
Uses GloVe embeddings and greedy sequence segmentation to semantically segment a text document into any number of k segments.
semantic-segmentation embeddings sequence-segmentation text-segmentation
Language:Jupyter Notebook 33
ReubenBond / HanBaoBao
Mandarin Chinese text segmentation and mobile dictionary Android app (中文分词)
text-segmentation transliteration android pinyin dictionary-data chinese-text-segmentation chinese
Language:Java 30
eskriett / spell
Spelling correction and string segmentation written in Go
golang spelling spelling-correction spell-check spellcheck word-segmentation text-segmentation string-segmentation symspell
Language:Go 27
rlayers / pawpaw
Text Processing & Segmentation Framework
nlp text-processing information-extraction text-segmentation hierarchical-text-segmentation extract-text knowledge-graph python xmlparser natural-language-processing parser query-engine query-language tree xml-parser lexer
Language:Python 25
wolfgarbe / WordSegmentationDP
Word Segmentation with Dynamic Programming
word-segmentation text-segmentation symspell spell-check spellcheck spellchecker spelling-correction spell-checker spelling-corrector spell-corrector
Language:C# 20
hyunbool / Text-Segmentation
Text Segmentation 관련 논문 정리
text-segmentation topic-segmentation topic-modeling
19
Feoramund / ucg
UTF-8 grapheme counting library written in C99.
grapheme text-segmentation unicode utf-8 wcwidth text-alignment
Language:C 18
smart-models / Normalized-Semantic-Chunker
Cutting-edge tool that unlocks the full potential of semantic chunking
document-processing embeddings gpu-acceleration llm rag rest-api semantic semantic-search text-segmentation
Language:Python 18
nitely / nim-segmentation
Unicode text segmentation (tr29)
nim unicode word-break text-segmentation
Language:Nim 10
npillmayer / uax
Unicode Text Segmentation Algorithms
unicode text-segmentation text-processing
Language:Go 9
zamgi / lingvo--TextSegmenter
Text segmentation into separate words using a simple unigram model and the Viterbi algorithm
text-segmentation viterbi-algorithm lingvo linguistics natural-language-processing nlp
Language:C# 9
shayneobrien / text-segmentation
Neural and nonneural text segmentation methods.
choi machine-learning nlp text-segmentation wikipedia
Language:Jupyter Notebook 8
Yannael / automatic-video-chaptering
Automate video chaptering with LLMs and TF-IDF: Transform raw transcripts into well-structured documents
chapterization segmentation text-segmentation video-chapters
Language:Jupyter Notebook 7
ImgAnalysisToolkit
athamana / ImgAnalysisToolkit
Image Analysis Toolkit for text document Binarization & Segmentation written in TypeScript.
angular angular-material web-workers ostu-threshold sauvola-threshold gpp-threshold text-segmentation arlsa-segmentation binarization image-processing image-analysis typescript
Language:TypeScript 6
kushalchauhan98 / ticket-segmentation
Data for the ACL 2020 paper - Improving Segmentation for Technical Support Problems
ibm-research-ai acl2020 text-segmentation natural-language-processing nlp-machine-learning nlp-datasets
6
DhavalTaunk08 / Text-Segmentation-in-Images
This project aimed to perform text segmentation in images using AutoEncoders.
autoencoders deep-learning text-segmentation python3 ipython-notebook
Language:Jupyter Notebook 5
Dobatymo / graphseg-python
natural-language-processing python text-segmentation graphseq
Language:Python 4
QuantumWizard888 / How-to-add-user-dictionary-to-MeCab
How to add user dictionary to MeCab
mecab natural-language-processing text-segmentation guide japanese japanese-language
4
WBSUBNdb_text---Bangla-handwritten-text-document-dataset
Chayan-halder / WBSUBNdb_text---Bangla-handwritten-text-document-dataset
"WBSUBNdb_text: Bangla handwritten text document dataset" is a Bangla text dataset containing 1383 offline handwritten text documents contributed by 190 writers. The dataset is composed of both simple and compound characters.
bangla-handwritten-text-dataset computer-vision image-processing text-segmentation bangla-dataset bangla-ocr
3
christophsk / segment-string
Demonstration of dynamic programming for segmenting strings into words. Just for fun!
dynamic-programming text-segmentation text-split
Language:Python 3
Sec-ant / segmentor
text-segmentation
Language:TypeScript 2

text-segmentation

catalyst-team / catalyst

wolfgarbe / SymSpell

blmoistawinde / HarvestText

ogkalu2 / comic-translate

mammothb / symspellpy

cbaziotis / ekphrasis

ZumingHuang / awesome-ocr-resources

notAI-tech / deepsegment

koomri / text-segmentation

sedflix / awesome-topic-segmentation

wolfgarbe / WordSegmentationTM

DCY1117 / MangaQuick

google / emoji-segmenter

viig99 / SymSpellCppPy

Jumpst3r / printed-hw-segmentation

ReemHal / Semantic-Text-Segmentation-with-Embeddings

ReubenBond / HanBaoBao

eskriett / spell

rlayers / pawpaw

wolfgarbe / WordSegmentationDP

hyunbool / Text-Segmentation

Feoramund / ucg

smart-models / Normalized-Semantic-Chunker

nitely / nim-segmentation

npillmayer / uax

zamgi / lingvo--TextSegmenter

shayneobrien / text-segmentation

Yannael / automatic-video-chaptering

athamana / ImgAnalysisToolkit

kushalchauhan98 / ticket-segmentation

DhavalTaunk08 / Text-Segmentation-in-Images

Dobatymo / graphseg-python

QuantumWizard888 / How-to-add-user-dictionary-to-MeCab

Chayan-halder / WBSUBNdb_text---Bangla-handwritten-text-document-dataset

christophsk / segment-string

Sec-ant / segmentor