aaronchen's repositories
audioseal
Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector
Auffusion
Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation"
CLAP
Contrastive Language-Audio Pretraining
CoMoSpeech
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
CoMoSVC
CoMoSVC: One-Step Consistency Model Based Singing Voice Conversion & Singing Voice Clone
DDPM-Midi2Performance-Model
Music generation
DTTNet-Pytorch
An official implementation of the ICASSP 2024 paper: Dual-Path TFC-TDF UNet for Music Source Separation
emotion2vec
Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
FineDance
FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation. (ICCV2023)
goct_ismir2023
code for "BEAT-ALIGNED SPECTROGRAM-TO-SEQUENCE GENERATION OF RHYTHM-GAME CHARTS" (ISMIR 2023)
GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
HierSpeechpp
The official implementation of HierSpeech++
JEN-1-pytorch
Unofficial implementation JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models(https://arxiv.org/abs/2308.04729)
languagecodec
Official code repository of Language-Codec
LODGE
The code the CVPR2024 paper Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance Primitives
M2UGen
This is the official repository for M2UGen
MP-SENet
MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra
PAM
PAM is a no-reference audio quality metric for audio generation tasks
PhotoMaker
PhotoMaker
pinyin-to-ipa
Command-line interface and Python library to transcribe pinyin to IPA. The tones are attached to the vowel of the syllable.
seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
snac
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
song-describer-dataset
The Song Describer dataset is an evaluation dataset made of ~1.1k captions for 706 permissively licensed music recordings.
VoiceCraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild