vBaiCai

Jingdong Li's starred repositories

whisper

Robust Speech Recognition via Large-Scale Weak Supervision

Language:PythonMIT62766 5270

bark

🔊 Text-Prompted Generative Audio Model

Language:Jupyter NotebookMIT33318 308 418

tuning_playbook

A playbook for systematically maximizing the performance of deep learning models.

NOASSERTION25488 281 36

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

Language:PythonMIT19950 189 356

Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language:PythonMIT4071 54 116

encodec

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

Language:PythonMIT3255 57 70

audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch

Language:PythonMIT2298 61 166

cccl

CUDA C++ Core Libraries

Language:C++NOASSERTION872 30 1038

BigVGAN

Official PyTorch implementation of BigVGAN (ICLR 2023)

Language:Python667 870

Meta-voicebox

Implementation of Meta-Voicebox : The first generative AI model for speech to generalize across tasks with state-of-the-art performance.

MIT542 86 4

MS-AMP

Microsoft Automatic Mixed Precision Library

Language:PythonMIT471 11 57

UniAudio

The Open Source Code of UniAudio

Language:Python460 39 27

NeuralSVB

Learning the Beauty in Songs: Neural Singing Voice Beautifier; ACL 2022 (Main conference); Official code

Language:PythonGPL-3.0414 13 19

AudioDec

An Open-source Streaming High-fidelity Neural Audio Codec

Language:PythonNOASSERTION350 30 25

uss

Language:PythonNOASSERTION311 12 11

FunCodec

FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.

Language:PythonMIT292 16 42

causal-conv1d

Causal depthwise conv1d in CUDA, with a PyTorch interface

Language:CudaBSD-3-Clause198 3 15

FRA-RIR

Language:PythonApache-2.0163 8 7

Dict-TTS

Language:Python130 6 5

faster-rwkv

Language:C++121 4 7

torch-pesq

PyTorch implementation of the Perceptual Evaluation of Speech Quality for wideband audio

Language:PythonMIT119 6 5

gss

A simple package for Guided source separation (GSS)

Language:PythonMIT99 5 8

torchiva

Blind source separation with independent vector analysis family of algorithm in torch

Language:PythonMIT84 5 3

UniAudio

The official source code of UniAudio

Language:Python73 8 1

meeteval

MeetEval - A meeting transcription evaluation toolkit

Language:PythonMIT63 7 8

SpatialCodec

Language:Python4200

CausalityCheck

Causality Check in Frame-online Speech Separation

Language:Python40 20

mvae-ss

Language:Python1000

interspeech2023-moving-iva-samples

Repository containing samples produced by the method proposed in "Multi-channel separation of dynamic speech and sound events" and presented at Interspeech 2023.

Language:HTML9 30

SpeakerVerSim

Python-based simulation framework for different version control strategies of speaker recognition systems.

Language:PythonApache-2.03 20