macroustc

0

followers

following

stars

macroustc's repositories

visual-chatgpt

Official repo for the paper: Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

MIT000

ChatPaper

Use ChatGPT to summarize the arXiv papers.

NOASSERTION000

SadTalker

（CVPR 2023）SadTalker：Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

MIT000

awesome-chatgpt-prompts-zh

ChatGPT 中文调教指南。各种场景使用指南。学习怎么让它听你的话。

MIT000

denoiser

Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder archi

NOASSERTION000

so-vits-svc

SoftVC VITS Singing Voice Conversion

MIT000

AudioLDM

AudioLDM: Generate speech, sound effects, music and beyond, with text.

NOASSERTION000

vits_chinese

Best TTS based on BERT and VITS with some Natural Speech Features Of Microsoft

000

audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch

MIT000

naturalspeech

A fully working pytorch implementation of NaturalSpeech (Tan et al., 2022)

000

audio-diffusion-pytorch

Audio generation using diffusion models, in PyTorch.

MIT000

UniSpeech

UniSpeech - Large Scale Self-Supervised Learning for Speech

NOASSERTION000

voxceleb_trainer

In defence of metric learning for speaker recognition

MIT000

nnsvs

Neural network-based singing voice synthesis library for research

MIT000

MB-iSTFT-VITS

Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform

Apache-2.0000

whisper

Robust Speech Recognition via Large-Scale Weak Supervision

MIT000

music_source_separation

NOASSERTION000

noisereduce

Noise reduction in python using spectral gating (speech, bioacoustics, audio, time-domain signals)

MIT000

wetts

Production First and Production Ready End-to-End Text-to-Speech Toolkit

Apache-2.0000

KAN-TTS

MIT000

LIA

[ICLR 22] Latent Image Animator: Learning to Animate Images via Latent Space Navigation

NOASSERTION000

FastASR

基于PaddleSpeech所使用的conformer模型，使用C++的高效实现模型推理，在树莓派4B等ARM平台运行也可流畅运行。

Apache-2.0000

Fay

语音互动，直播自动带货虚拟数字人

GPL-3.0100

chinese_speech_pretrain

chinese speech pretrained models

000

LIHQ

Long-Inference, High Quality Synthetic Speaker

000

DeepFaceLive

Real-time face swap for PC streaming or video calls

GPL-3.0000

awesome-talking-head-generation

000

iSTFTNet-pytorch

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform

Apache-2.0000

StarGANv2-VC

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

MIT000

tortoise-tts

A multi-voice TTS system trained with an emphasis on quality

Apache-2.0000