There are 3 repositories under text-segmentation topic.
Accelerated deep learning R&D
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
A collection of resources (including the papers and datasets) of OCR (Optical Character Recognition).
A sentence segmenter that actually works!
Implementation of the paper: Text Segmentation as a Supervised Learning Task
(yet another not really) awesome topic/text segmentation list
Fast Word Segmentation with Triangular Matrix
Fast SymSpell written in c++ and exposes to python via pybind11
Mandarin Chinese text segmentation and mobile dictionary Android app (中文分词)
Printed and handwritten text segmentation using fully convolutional networks and CRF post-processing
Spelling correction and string segmentation written in Go
Uses GloVe embeddings and greedy sequence segmentation to semantically segment a text document into any number of k segments.
Word Segmentation with Dynamic Programming
Text segmentation into separate words using a simple unigram model and the Viterbi algorithm
Text Segmentation 관련 논문 정리
Data for the ACL 2020 paper - Improving Segmentation for Technical Support Problems
This project aimed to perform text segmentation in images using AutoEncoders.
Transcript segmentation using the average semantic encodings of cue sentences.
Neural and nonneural text segmentation methods.
"WBSUBNdb_text: Bangla handwritten text document dataset" is a Bangla text dataset containing 1352 offline handwritten text documents contributed by 188 writers. The dataset is composed of both simple and compound characters.
Language processing interface: some tools to process different natural languages
Analyzing argumentative writing elements from students grade 6-12.
Text segmentation solution using natural language processing.
Image Analysis Toolkit for text document Binarization & Segmentation written in TypeScript.
Perl wrapper for CppJieba (Chinese text segmentation)
Demonstration of dynamic programming for segmenting strings into words. Just for fun!
How to add user dictionary to MeCab
word segmentation for Bengali text by CRF .
word segmentation for Sanskrit text by CRF .