aaronchen's repositories

10997_mwmae

Repository for MW-MAE paper submitted to NeurIPS 2023

Language:PythonStargazers:1Issues:0Issues:0

BABE

Zero-Shot Blind Audio Bandwidth Extension

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

descript-audio-codec

State-of-the-art audio codec with 90x compression factor. Supports 44.1 kHz mono/stereo audio.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

DisCo

DisCo: Referring Human Dance Generation in Real World

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

eben

Repo for source code of EBEN: Extreme Bandwidth Extension Network

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

EfficientAT_HEAR

Evaluate EfficientAT models on the Holistic Evaluation of Audio Representations Benchmark.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

enhancr

Video Frame Interpolation & Super Resolution using NVIDIA's TensorRT & Tencent's NCNN inference, beautifully crafted and packaged into a single app

Language:JavaScriptLicense:GPL-3.0Stargazers:0Issues:0Issues:0

KAIR

Image Restoration Toolbox (PyTorch). Training and testing codes for DPIR, USRNet, DnCNN, FFDNet, SRMD, DPSR, BSRGAN, SwinIR

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

llark

Code for the paper "LLark: A Multimodal Foundation Model for Music" by Josh Gardner, Simon Durand, Daniel Stoller, and Rachel Bittner.

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

lp-music-caps

LP-MusicCaps: LLM-Based Pseudo Music Captioning [ISMIR23]

Language:PythonStargazers:0Issues:0Issues:0

MakeDiffSinger

Pipelines and tools to build your own DiffSinger dataset.

Language:PythonLicense:BSD-3-ClauseStargazers:0Issues:0Issues:0

MU-LLaMA

MU-LLaMA: Music Understanding Large Language Model

Language:PythonStargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

peft-ser

PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Speech Models (Accepted to 2023 ACII)

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

pesto

Self-supervised learning for fast pitch estimation

Language:PythonLicense:LGPL-3.0Stargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:0Issues:0

PyMusicLooper

A python program for creating seamless music loops, with play/export support.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

RemFx

General Purpose Audio Effect Removal

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:0Issues:0

SC_VALL-E

Style-Controllable Zero-Shot Text to Speech Synthesizer based on VALL-E

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

SongDriver-Real-time-Music-Accompaniment-Generation-without-Logical-Latency-nor-Exposure-Bias

SongDriver uses a parallel mechanism of prediction and arrangement phases to achieve zero logical latency in real-time accompaniment generation, significantly reducing exposure bias.

Language:CLicense:MITStargazers:0Issues:0Issues:0

SongDriver2-Real-time-Emotion-based-Music-Arrangement-with-Soft-Transition

We first recognize the last timestep's music emotion and then fuse it with the current timestep's target input emotion. The fused emotion then serves as the guidance for SongDriver2 to generate the upcoming music based on the input melody data.

Language:CStargazers:0Issues:0Issues:0

SpeechPrompt

**Interspeech 2022** 《SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks》Speech processing with prompting paradigm

Language:PythonStargazers:0Issues:0Issues:0

StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

License:MITStargazers:0Issues:0Issues:0

TDANet

An efficient speech separation method

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

UniCATS-CTX-vec2wav

Code for CTX-vec2wav in UniCATS

Stargazers:0Issues:0Issues:0
Language:PythonLicense:MITStargazers:0Issues:0Issues:0

vocos

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

whisper-at

Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"

Language:PythonLicense:BSD-2-ClauseStargazers:0Issues:0Issues:0

XPhoneBERT

XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech (INTERSPEECH 2023)

Language:PythonLicense:MITStargazers:0Issues:0Issues:0