There are 10 repositories under text-normalization topic.
🧹 Python package for text cleaning
Chinese text normalization for speech processing
Japanese text normalizer for mecab-neologd
Russian text normalization pipeline for speech-to-text and other applications based on tagging s2s networks
Myanmar Language Script Library
Demonstration of the results in "Text Normalization using Memory Augmented Neural Networks", Authors: Subhojeet Pramanik, Aman Hussain
Code and model files for paper: I. Lourentzou et al., Adapting Sequence to Sequence models for Text Normalization in Social Media", ICWSM'19
This python module is an easy-to-use port of the text normalization used in the paper "Not low-resource anymore: Aligner ensembling, batch filtering, and new datasets for Bengali-English machine translation". It is intended to be used for normalizing / cleaning Bengali and English text.
Convert English text from written expressions into spoken forms
Inneall aistriúcháin atá taobh thiar de Chaighdeánaitheoir na Gaeilge, agus aistritheoirí Gàidhlig/Gaelg→Gaeilge
A Python library for text normalization, specifically designed for Vietnamese and English text processing. This library provides comprehensive text normalization capabilities including handling of special characters, numbers, dates, and various text formats.
Proper categorization of e-commerce products enhances the user experience and achieves better results with external search engines. The objective of the project is to classify a product into four given categories, based on its description available on an e-commerce platform.
JS / Python3 / PHP Lib to work with UTF8 polytonic greek and latin
pyTorch implementation for Text Normalization Challenge
Useful String extensions to save you time in production.
An online text normalization tool for Chinese-English mixed text-to-speech system
Repository for text normalization research.
Fast, precise normalization of Unix and DOS newline formats in Rust.
Soe Vinorm: An Effective Text Normalization Toolkit for converting Vietnamese text to its spoken form.
A JavaScript library for accent-insensitive text processing, including accent folding and search term highlighting
An ASR recipe and speech corpus of Icelandic parliamentary speeches
A utility that cleans up text by removing or translating common 'slop' patterns from AI-generated text
Training Tacotron 2 Text-to-Speech (TTS)
My work during internship at FPT.AI 2020
Library supports converting number to Vietnamese for .NET C# ./
Command-line interface (CLI) and library to normalize English texts.
Our source code for the paper "Transformer-based Joint Learning Approach for Text Normalization in Vietnamese ASR"
Modern .NET 9 / C# 13 library to normalize text (emojis, currency, numbers, abbreviations, chat slang) for consistent and natural Text-to-Speech (TTS) synthesis, ideal for stream chat/donations.
This repository provides a complete workflow for text processing using Hugging Face Transformers and NLTK. It includes modules for sentence normalization, spelling correction, word embedding generation, positional encoding computation, and English-to-French translation
A 🇰🇭 utility library for number formatting, currency display, date localization, text normalization, and script transliteration, built for Cambodian developers.
A PHP library for Persian text conversion, including number translation, diacritics removal, and normalization with a fluent API.