chenchen's starred repositories

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language:PythonLicense:Apache-2.0Stargazers:34520Issues:343Issues:2692

kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.

Language:ShellLicense:NOASSERTIONStargazers:14060Issues:696Issues:1641

NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Language:PythonLicense:Apache-2.0Stargazers:11382Issues:200Issues:2212

AudioGPT

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Language:PythonLicense:NOASSERTIONStargazers:9948Issues:131Issues:48
Language:PythonLicense:Apache-2.0Stargazers:7061Issues:66Issues:70

mmocr

OpenMMLab Text Detection, Recognition and Understanding Toolbox

Language:PythonLicense:Apache-2.0Stargazers:4256Issues:58Issues:896

wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit

Language:PythonLicense:Apache-2.0Stargazers:4045Issues:90Issues:1019

FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Language:PythonLicense:NOASSERTIONStargazers:3955Issues:48Issues:841

EasyLM

Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.

Language:PythonLicense:Apache-2.0Stargazers:2353Issues:42Issues:88

OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)

Language:PythonLicense:Apache-2.0Stargazers:1763Issues:21Issues:179

WhisperFusion

WhisperFusion builds upon the capabilities of WhisperLive and WhisperSpeech to provide a seamless conversations with an AI.

YoutubePlaylistDownloader

A tool to download whole playlists, channels or single videos from youtube and also optionally convert them to almost any format you would like

Language:C#License:Apache-2.0Stargazers:1440Issues:27Issues:218

OpenDiT

OpenDiT: An Easy, Fast and Memory-Efficient System for DiT Training and Inference

Language:PythonLicense:Apache-2.0Stargazers:1413Issues:23Issues:60

SpeechT5

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing

Language:PythonLicense:MITStargazers:1142Issues:26Issues:80

k2

FSA/FST algorithms, differentiable, with PyTorch compatibility.

Language:CudaLicense:Apache-2.0Stargazers:1105Issues:77Issues:377

INTERSPEECH-2023-Papers

INTERSPEECH 2023 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023 conference. Explore the latest advances in speech and language processing. Code included. Star the repository to support the advancement of speech technology!

ICASSP-2023-24-Papers

ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!

Language:PythonLicense:MITStargazers:341Issues:28Issues:3

cyrillic-transliteration

Transliterate Cyrillic script to Latin script and vice versa.

Language:PythonLicense:MITStargazers:97Issues:6Issues:16

DPHuBERT

INTERSPEECH 2023: "DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models"

Language:PythonLicense:MITStargazers:97Issues:6Issues:4

transfusion-asr

Transcribing Speech with Multinomial Diffusion, training code and models.

Language:PythonLicense:NOASSERTIONStargazers:74Issues:8Issues:3

awesome-asr-contextualization

A curated list of awesome papers on contextualizing E2E ASR outputs

clairaudience

Zero-shot Domain-sensitive Speech Recognition with Prompt-conditioning Fine-tuning (ASRU2023)

Language:PythonLicense:Apache-2.0Stargazers:25Issues:4Issues:2

xlm_to_xlsr

Official implementation of the paper "Distilling a Pretrained Language Model to a Multilingual ASR Model" (Interspeech 2022)

Language:PythonLicense:MITStargazers:10Issues:3Issues:2

Contextual-Biasing-Dataset

open-source Mandarian biased word dataset

dual_cross_modality-AVSR

The audio visual speech recognition model which dual cross modality attention based on sigmedia-AVSR code

Language:PythonLicense:GPL-3.0Stargazers:1Issues:1Issues:0