Jiamin Xie's starred repositories
English-words-pronunciation-mp3-audio-download
Download the pronunciation mp3 audio for 119,376 unique English words/terms
english-words
:memo: A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion
english-fisher-annotations
A recipe for constituency parsing, disfluency tagging and obtaining the fluent transcripts of English Fisher dataset
awesome-disfluency-detection
A curated list of awesome disfluency detection publications along with the released code and bibliographical information
relative_phoneme_analysis
Repository for phoneme analysis on word-level Kaldi/ESPNet ASR transcripts
distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
speech-trident
Awesome speech/audio LLMs, representation learning, and codec models
Child-ASR-Paper
A list of papers for child ASR
Disentanglement-of-Emotional-Style-and-Speaker-Identity-for-Expressive-Voice-Conversion
This is the implementation our Interspeech 2022 paper " Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion".
KnowledgeEditingPapers
Must-read Papers on Knowledge Editing for Large Language Models.
speech-model-compression
A collection of papers related to speech model compression
audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
awesome-large-audio-models
Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
tuning_playbook
A playbook for systematically maximizing the performance of deep learning models.
adjustText
A small library for automatically adjustment of text position in matplotlib plots to minimize overlaps.
wer_are_we
Attempt at tracking states of the arts and recent results (bibliography) on speech recognition.
codec2-dev
Open source speech codec designed for communications quality speech between 450 and 3200 bit/s. The main application is low bandwidth HF/VHF digital radio.
speechbrain
A PyTorch-based Speech Toolkit