Jiamin Xie's starred repositories

Language:PythonStargazers:11Issues:0Issues:0

English-words-pronunciation-mp3-audio-download

Download the pronunciation mp3 audio for 119,376 unique English words/terms

Language:PythonLicense:Apache-2.0Stargazers:174Issues:0Issues:0

misspell

Correct commonly misspelled English words in source files

Language:GoLicense:MITStargazers:1343Issues:0Issues:0

english-words

:memo: A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion

Language:PythonLicense:UnlicenseStargazers:10385Issues:0Issues:0

english-fisher-annotations

A recipe for constituency parsing, disfluency tagging and obtaining the fluent transcripts of English Fisher dataset

Language:PythonStargazers:12Issues:0Issues:0

awesome-disfluency-detection

A curated list of awesome disfluency detection publications along with the released code and bibliographical information

Stargazers:70Issues:0Issues:0

relative_phoneme_analysis

Repository for phoneme analysis on word-level Kaldi/ESPNet ASR transcripts

Language:PythonStargazers:8Issues:0Issues:0

peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Language:PythonLicense:Apache-2.0Stargazers:15475Issues:0Issues:0

distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

Language:PythonLicense:MITStargazers:3407Issues:0Issues:0

speech-trident

Awesome speech/audio LLMs, representation learning, and codec models

Stargazers:544Issues:0Issues:0

g2p

g2p: English Grapheme To Phoneme Conversion

Language:PythonLicense:Apache-2.0Stargazers:781Issues:0Issues:0

whisper

Robust Speech Recognition via Large-Scale Weak Supervision

Language:PythonLicense:MITStargazers:66074Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:863Issues:0Issues:0
Language:PythonStargazers:158Issues:0Issues:0

Child-ASR-Paper

A list of papers for child ASR

License:MITStargazers:25Issues:0Issues:0

NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Language:PythonLicense:Apache-2.0Stargazers:11253Issues:0Issues:0

Disentanglement-of-Emotional-Style-and-Speaker-Identity-for-Expressive-Voice-Conversion

This is the implementation our Interspeech 2022 paper " Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion".

Language:PythonStargazers:14Issues:0Issues:0

KnowledgeEditingPapers

Must-read Papers on Knowledge Editing for Large Language Models.

License:MITStargazers:794Issues:0Issues:0

speech-model-compression

A collection of papers related to speech model compression

Stargazers:23Issues:0Issues:0

audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

Language:PythonLicense:MITStargazers:20432Issues:0Issues:0

awesome-large-audio-models

Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.

Stargazers:502Issues:0Issues:0
Language:Jupyter NotebookStargazers:163Issues:0Issues:0

tuning_playbook

A playbook for systematically maximizing the performance of deep learning models.

License:NOASSERTIONStargazers:26145Issues:0Issues:0

adjustText

A small library for automatically adjustment of text position in matplotlib plots to minimize overlaps.

Language:Jupyter NotebookLicense:MITStargazers:1467Issues:0Issues:0

wer_are_we

Attempt at tracking states of the arts and recent results (bibliography) on speech recognition.

Stargazers:1866Issues:0Issues:0

VQMIVC

Official implementation of VQMIVC: One-shot (any-to-any) Voice Conversion @ Interspeech 2021 + Online playing demo!

Language:Jupyter NotebookLicense:MITStargazers:329Issues:0Issues:0

codec2-dev

Open source speech codec designed for communications quality speech between 450 and 3200 bit/s. The main application is low bandwidth HF/VHF digital radio.

Language:CLicense:LGPL-2.1Stargazers:598Issues:0Issues:0

tvdcn

Torchvision-like Deformable Convolution with both 1D, 2D, 3D operators, and their transposed versions.

Language:C++License:MITStargazers:19Issues:0Issues:0

espnet

End-to-End Speech Processing Toolkit

Language:PythonLicense:Apache-2.0Stargazers:8214Issues:0Issues:0

speechbrain

A PyTorch-based Speech Toolkit

Language:PythonLicense:Apache-2.0Stargazers:8417Issues:0Issues:0