There are 12 repositories under text-processing topic.
:zap: From finding text to search and replace, from sorting to beautifying text and more :art:
Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
Intuitive find & replace CLI (sed alternative)
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
Text Classification Algorithms: A Survey
Python library for creating PEG parsers
Program to convert lines of text into a tree structure.
👄 The most accurate natural language detection library for Go, suitable for long and short text alike
A fast implementation of Aho-Corasick in Rust.
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
A simple Python module for parsing human names into their individual components
Open Korean Text Processor - An Open-source Korean Text Processor
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).
Simple SQL-like syntax on top of Perl text processing.
🗣️ Tool to generate adversarial text examples and test machine learning models against them
Textpipe: clean and extract metadata from text
A low level regular expression library that uses deterministic finite automata.
Automatic Korean word spacing with Python
THE String Processing Package for R (with ICU)
Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, and Zenkaku
UNIC: Unicode and Internationalization Crates for Rust
Text vectorization tool to outperform TFIDF for classification tasks
短文本聚类预处理模块 Short text cluster
Python library for Natural Language Preprocessing (NLPre)
Tool which allow you to detect and translate text.
🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.
Recreated sources for the book "UNIX Text Processing," published in 1987.
Text Mining and Topic Modeling Toolkit for Python with parallel processing power
Util collection for Japanese text processing. Hiraganize, Katakanize, and Romanize.
Extract indicators of compromise from text, including "escaped" ones.
A Golang library for processing Asciidoc files.
My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.
A web app to create and browse text visualizations for automated customer listening.