aaronchen's repositories
10997_mwmae
Repository for MW-MAE paper submitted to NeurIPS 2023
BABE
Zero-Shot Blind Audio Bandwidth Extension
descript-audio-codec
State-of-the-art audio codec with 90x compression factor. Supports 44.1 kHz mono/stereo audio.
DisCo
DisCo: Referring Human Dance Generation in Real World
eben
Repo for source code of EBEN: Extreme Bandwidth Extension Network
EfficientAT_HEAR
Evaluate EfficientAT models on the Holistic Evaluation of Audio Representations Benchmark.
enhancr
Video Frame Interpolation & Super Resolution using NVIDIA's TensorRT & Tencent's NCNN inference, beautifully crafted and packaged into a single app
KAIR
Image Restoration Toolbox (PyTorch). Training and testing codes for DPIR, USRNet, DnCNN, FFDNet, SRMD, DPSR, BSRGAN, SwinIR
llark
Code for the paper "LLark: A Multimodal Foundation Model for Music" by Josh Gardner, Simon Durand, Daniel Stoller, and Rachel Bittner.
lp-music-caps
LP-MusicCaps: LLM-Based Pseudo Music Captioning [ISMIR23]
MakeDiffSinger
Pipelines and tools to build your own DiffSinger dataset.
MU-LLaMA
MU-LLaMA: Music Understanding Large Language Model
peft-ser
PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Speech Models (Accepted to 2023 ACII)
pesto
Self-supervised learning for fast pitch estimation
PyMusicLooper
A python program for creating seamless music loops, with play/export support.
RemFx
General Purpose Audio Effect Removal
SC_VALL-E
Style-Controllable Zero-Shot Text to Speech Synthesizer based on VALL-E
SongDriver-Real-time-Music-Accompaniment-Generation-without-Logical-Latency-nor-Exposure-Bias
SongDriver uses a parallel mechanism of prediction and arrangement phases to achieve zero logical latency in real-time accompaniment generation, significantly reducing exposure bias.
SongDriver2-Real-time-Emotion-based-Music-Arrangement-with-Soft-Transition
We first recognize the last timestep's music emotion and then fuse it with the current timestep's target input emotion. The fused emotion then serves as the guidance for SongDriver2 to generate the upcoming music based on the input melody data.
SpeechPrompt
**Interspeech 2022** 《SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks》Speech processing with prompting paradigm
StyleTTS2
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
TDANet
An efficient speech separation method
UniCATS-CTX-vec2wav
Code for CTX-vec2wav in UniCATS
vocos
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
whisper-at
Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"
XPhoneBERT
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech (INTERSPEECH 2023)