andyye1999

redust's starred repositories

FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Language:PythonNOASSERTION550900

ChatTTS

A generative speech model for daily dialogue.

Language:PythonAGPL-3.02930600

awesome-large-audio-models

Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.

50200

EmotiVoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

Language:PythonApache-2.0706700

Qwen2-Audio

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Language:Python86900

SemantiCodec-inference

Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.

Language:PythonMIT10300

SenseVoice

Multilingual Voice Understanding Model

Language:PythonNOASSERTION211400

CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Language:PythonApache-2.0391700

encodecmae

Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'

Language:Python7900

gtcrn

The official implementation of GTCRN, an ultra-lite speech enhancement model.

Language:PythonMIT16000

TRT-SE

An example of a speech enhancement model deployed with TensorRT.

Language:Python2700

Sixty-years-of-frequency-domain-monaural-speech-enhancement

Language:Python10600

BAE-Net

BAE-NET: A LOW COMPLEXITY AND HIGH FIDELITY BANDWIDTH-ADAPTIVE NEURAL NETWORK FOR SPEECH SUPER-RESOLUTION

Language:Python5100

speechmetrics

A wrapper around speech quality metrics MOSNet, BSSEval, STOI, PESQ, SRMR, SISDR

Language:PythonMIT88600

emotion2vec

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Language:Python55100

voice_datasets

🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).

164400

Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Language:PythonNOASSERTION134800

dual-path-RNNs-DPRNNs-based-speech-separation

A PyTorch implementation of dual-path RNNs (DPRNNs) based speech separation described in "Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation".

Language:Python16500

Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Language:PythonApache-2.01301900

seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

Language:Jupyter NotebookNOASSERTION1066400

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Language:PythonMIT437800

andyye1999

redust's starred repositories

FunASR

ChatTTS

awesome-large-audio-models

EmotiVoice

Supercodec

Qwen2-Audio

SemantiCodec-inference

SenseVoice

CosyVoice

encodecmae

gtcrn

TRT-SE

Sixty-years-of-frequency-domain-monaural-speech-enhancement

BAE-Net

speechmetrics

emotion2vec

voice_datasets

Qwen-Audio

dual-path-RNNs-DPRNNs-based-speech-separation

Qwen

seamless_communication

Amphion

UltraDualPathCompression

SALMONN

FunCodec

AudioCodec-Hub

awesome-speech-recognition-speech-synthesis-papers

coder2gwy

SFANC-Window

awesome-python-scientific-audio