chenchy

PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Speech Models (Accepted to 2023 ACII)

Language:PythonApache-2.0000

pesto

Self-supervised learning for fast pitch estimation

Language:PythonLGPL-3.0000

PyMusicLooper

A python program for creating seamless music loops, with play/export support.

Language:PythonMIT000

RemFx

General Purpose Audio Effect Removal

Language:PythonApache-2.0000

SC_VALL-E

Style-Controllable Zero-Shot Text to Speech Synthesizer based on VALL-E

Language:PythonMIT000

SongDriver-Real-time-Music-Accompaniment-Generation-without-Logical-Latency-nor-Exposure-Bias

SongDriver uses a parallel mechanism of prediction and arrangement phases to achieve zero logical latency in real-time accompaniment generation, significantly reducing exposure bias.

Language:CMIT000

SongDriver2-Real-time-Emotion-based-Music-Arrangement-with-Soft-Transition

We first recognize the last timestep's music emotion and then fuse it with the current timestep's target input emotion. The fused emotion then serves as the guidance for SongDriver2 to generate the upcoming music based on the input melody data.

Language:C000

SpeechPrompt

**Interspeech 2022** 《SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks》Speech processing with prompting paradigm

Language:Python000

StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

MIT000

TDANet

An efficient speech separation method

Language:PythonApache-2.0000

UniCATS-CTX-vec2wav

Code for CTX-vec2wav in UniCATS

000

vampnet

Language:PythonMIT000

vocos

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Language:PythonMIT000

whisper-at

Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"

Language:PythonBSD-2-Clause000

XPhoneBERT

XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech (INTERSPEECH 2023)

Language:PythonMIT000

chenchy

aaronchen's repositories

10997_mwmae

BABE

descript-audio-codec

DisCo

eben

EfficientAT_HEAR

enhancr

KAIR

llark

lp-music-caps

MakeDiffSinger

MU-LLaMA

multi-source-diffusion-models

peft-ser

pesto

polyffusion

PyMusicLooper

RemFx

RMVPE

SC_VALL-E

SongDriver-Real-time-Music-Accompaniment-Generation-without-Logical-Latency-nor-Exposure-Bias

SongDriver2-Real-time-Emotion-based-Music-Arrangement-with-Soft-Transition

SpeechPrompt

StyleTTS2

TDANet

UniCATS-CTX-vec2wav

vampnet

vocos

whisper-at

XPhoneBERT