AndongLi (Andong-Li-speech)

Andong-Li-speech

Geek Repo

Location:Beijing, China

Home Page:https://andong-li-speech.github.io

Github PK Tool:Github PK Tool

AndongLi's starred repositories

langchain

🦜🔗 Build context-aware reasoning applications

Language:PythonLicense:MITStargazers:86818Issues:662Issues:6916

Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Language:PythonLicense:Apache-2.0Stargazers:11950Issues:96Issues:1018

flash-attention

Fast and memory-efficient exact attention

Language:PythonLicense:BSD-3-ClauseStargazers:11487Issues:106Issues:823

VALL-E-X

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io

Language:PythonLicense:MITStargazers:7346Issues:82Issues:148

text-to-text-transfer-transformer

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

Language:PythonLicense:Apache-2.0Stargazers:5963Issues:104Issues:405

AudioLDM

AudioLDM: Generate speech, sound effects, music and beyond, with text.

Language:PythonLicense:NOASSERTIONStargazers:2294Issues:41Issues:98

CLAP

Contrastive Language-Audio Pretraining

Language:PythonLicense:CC0-1.0Stargazers:1213Issues:28Issues:78

descript-audio-codec

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Language:PythonLicense:MITStargazers:963Issues:26Issues:56

RectifiedFlow

Official Implementation of Rectified Flow (ICLR2023 Spotlight)

Meta-voicebox

Implementation of Meta-Voicebox : The first generative AI model for speech to generalize across tasks with state-of-the-art performance.

UniAudio

The Open Source Code of UniAudio

Matcha-TTS

[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching

Language:Jupyter NotebookLicense:MITStargazers:419Issues:13Issues:41

pytorch_ema

Tiny PyTorch library for maintaining a moving average of a collection of parameters.

Language:PythonLicense:MITStargazers:391Issues:4Issues:8

Text-to-sound-Synthesis

The source code of our paper "Diffsound: discrete diffusion model for text-to-sound generation"

DiffiT

Official Repository for DiffiT: Diffusion Vision Transformers for Image Generation

FunCodec

FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.

Language:PythonLicense:MITStargazers:292Issues:16Issues:42

SoundStorm

The reproduced code for Google's SoundStorm

VoiceLDM

VoiceLDM: Text-to-Speech with Environmental Context

Language:PythonLicense:Apache-2.0Stargazers:132Issues:7Issues:3

EDiffSR

[IEEE TGRS 2024] EDiffSR: An Efficient Diffusion Probabilistic Model for Remote Sensing Image Super-Resolution

HGRN

[NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Sequence Modeling

Robust-E2E-ASR

This repository contains the code for our upcoming paper An Investigation of End-to-End Models for Robust Speech Recognition at ICASSP 2021.

Language:PythonLicense:MITStargazers:44Issues:3Issues:6

DOSE

DOSE: Diffusion Dropout with Adaptive Prior for Speech Enhancement, Conference on Neural Information Processing Systems (NeurIPS), 2023

Language:PythonStargazers:32Issues:0Issues:1

DTLN-aec

DTLN net for acoustic echo cancellation

DPMTSE

A Diffusion Probabilistic Model for Target Sound Extraction

Language:PythonStargazers:25Issues:0Issues:0

pymcd

Package pymcd

Language:PythonLicense:MITStargazers:20Issues:2Issues:0
Language:PythonStargazers:7Issues:0Issues:0
Language:PythonStargazers:5Issues:2Issues:0

Neural-Gradient-Regularizer

This repository contains official implementation of Neural Gradient Regularizer (NGR).

Language:PythonStargazers:4Issues:0Issues:0