Jim O’Regan's repositories
g2p_correction
Web-application for grapheme-to-phoneme correction using user feedback
tesseract-gle-uncial
Automatically exported from code.google.com/p/tesseract-gle-uncial
UD_Irish
Irish data
uninum
A database of number names for 186 languages, locales, and scripts
Matcha-TTS
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
NeMo-text-processing
NeMo text processing for ASR and TTS
NeMo
NeMo: a toolkit for conversational AI
cmudict
CMU US English Dictionary
epitran
A tool for transcribing orthographic text as IPA (International Phonetic Alphabet)
corpuscrawler
Crawler for linguistic corpora
langdata
Source training data for Tesseract for lots of languages
wordnet-gaeilge
Automatically exported from code.google.com/p/wordnet-gaeilge
language-resources
Datasets and tools for basic natural language processing.
Neural-HMM
Neural HMMs are all you need (for high-quality attention-free TTS)
rbg2p
Utilities for rule based, manually written, grapheme to phoneme rules
spaCy
💫 Industrial-strength Natural Language Processing (NLP) with Python and Cython