macroustc's repositories
visual-chatgpt
Official repo for the paper: Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
ChatPaper
Use ChatGPT to summarize the arXiv papers.
SadTalker
(CVPR 2023)SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
awesome-chatgpt-prompts-zh
ChatGPT 中文调教指南。各种场景使用指南。学习怎么让它听你的话。
denoiser
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder archi
so-vits-svc
SoftVC VITS Singing Voice Conversion
AudioLDM
AudioLDM: Generate speech, sound effects, music and beyond, with text.
vits_chinese
Best TTS based on BERT and VITS with some Natural Speech Features Of Microsoft
audiolm-pytorch
Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
naturalspeech
A fully working pytorch implementation of NaturalSpeech (Tan et al., 2022)
audio-diffusion-pytorch
Audio generation using diffusion models, in PyTorch.
UniSpeech
UniSpeech - Large Scale Self-Supervised Learning for Speech
voxceleb_trainer
In defence of metric learning for speaker recognition
nnsvs
Neural network-based singing voice synthesis library for research
MB-iSTFT-VITS
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform
whisper
Robust Speech Recognition via Large-Scale Weak Supervision
noisereduce
Noise reduction in python using spectral gating (speech, bioacoustics, audio, time-domain signals)
wetts
Production First and Production Ready End-to-End Text-to-Speech Toolkit
LIA
[ICLR 22] Latent Image Animator: Learning to Animate Images via Latent Space Navigation
FastASR
基于PaddleSpeech所使用的conformer模型,使用C++的高效实现模型推理,在树莓派4B等ARM平台运行也可流畅运行。
chinese_speech_pretrain
chinese speech pretrained models
LIHQ
Long-Inference, High Quality Synthetic Speaker
DeepFaceLive
Real-time face swap for PC streaming or video calls
iSTFTNet-pytorch
iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform
StarGANv2-VC
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
tortoise-tts
A multi-voice TTS system trained with an emphasis on quality