aaronchen's repositories

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

audioseal

Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector

License:MITStargazers:0Issues:0Issues:0

Auffusion

Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation"

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:0Issues:0Issues:0

CLAP

Contrastive Language-Audio Pretraining

License:CC0-1.0Stargazers:0Issues:0Issues:0

CoMoSpeech

CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

CoMoSVC

CoMoSVC: One-Step Consistency Model Based Singing Voice Conversion & Singing Voice Clone

Language:PythonStargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

DTTNet-Pytorch

An official implementation of the ICASSP 2024 paper: Dual-Path TFC-TDF UNet for Music Source Separation

License:Apache-2.0Stargazers:0Issues:0Issues:0

emotion2vec

Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Language:PythonStargazers:0Issues:0Issues:0

FineDance

FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation. (ICCV2023)

License:NOASSERTIONStargazers:0Issues:0Issues:0

goct_ismir2023

code for "BEAT-ALIGNED SPECTROGRAM-TO-SEQUENCE GENERATION OF RHYTHM-GAME CHARTS" (ISMIR 2023)

Language:Jupyter NotebookStargazers:0Issues:0Issues:0

GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

HierSpeechpp

The official implementation of HierSpeech++

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

JEN-1-pytorch

Unofficial implementation JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models(https://arxiv.org/abs/2308.04729)

Language:PythonStargazers:0Issues:0Issues:0

languagecodec

Official code repository of Language-Codec

License:MITStargazers:0Issues:0Issues:0
Language:PythonLicense:MITStargazers:0Issues:0Issues:0

LODGE

The code the CVPR2024 paper Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance Primitives

Stargazers:0Issues:0Issues:0

M2UGen

This is the official repository for M2UGen

Language:Jupyter NotebookLicense:MITStargazers:0Issues:0Issues:0
License:Apache-2.0Stargazers:0Issues:0Issues:0

MP-SENet

MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra

License:MITStargazers:0Issues:0Issues:0
Language:PythonLicense:MITStargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:0Issues:0

PAM

PAM is a no-reference audio quality metric for audio generation tasks

License:MITStargazers:0Issues:0Issues:0

PhotoMaker

PhotoMaker

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:0Issues:0Issues:0

pinyin-to-ipa

Command-line interface and Python library to transcribe pinyin to IPA. The tones are attached to the vowel of the syllable.

License:MITStargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:0Issues:0

seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

Language:CLicense:NOASSERTIONStargazers:0Issues:0Issues:0

snac

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

License:MITStargazers:0Issues:0Issues:0

song-describer-dataset

The Song Describer dataset is an evaluation dataset made of ~1.1k captions for 706 permissively licensed music recordings.

Language:Jupyter NotebookLicense:MITStargazers:0Issues:0Issues:0

VoiceCraft

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0