There are 24 repositories under text-processing topic.
:zap: From finding text to search and replace, from sorting to beautifying text and more :art:
Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
Text Classification Algorithms: A Survey
Program to convert lines of text into a tree structure.
Persian NLP Toolkit
A fast implementation of Aho-Corasick in Rust.
A fast and convenient fuzzy matcher library for rust
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
A simple Python module for parsing human names into their individual components
Open Korean Text Processor - An Open-source Korean Text Processor
All-in-one text de-duplication
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
Text Normalization & Inverse Text Normalization
Simple SQL-like syntax on top of Perl text processing.
Automatic Korean word spacing with Python
🗣️ Tool to generate adversarial text examples and test machine learning models against them
A low level regular expression library that uses deterministic finite automata.
Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, and Zenkaku
Pandrator aspires to be a user-friendly app with a graphical interface and a one-click installer that creates high-quality speech from text in multiple languages (audiobooks, speech synchronised with subtitles and more) using local models (XTTS, Silero or VoiceCraft), plus voice cloning, LLM pre-processing, RVC enhancement, and automatic evaluation
短文本聚类预处理模块 Short text cluster
Recreated sources for the book "UNIX Text Processing," published in 1987.
A Golang library for processing Asciidoc files.
🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure in Rust.