macroustc's repositories
faceswap
Deepfakes Software For All
silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector, Language Classifier and Spoken Number Detector
natural-speech-pytorch
Implementation of the neural network proposed in Natural Speech, a text-to-speech generator that is indistinguishable from human recordings for the first time, from Microsoft Research
PaddleNLP
Easy-to-use and powerful NLP library with Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including Neural Search, Question Answering, Information Extraction and Sentiment Analysis end-to-end system.
DeepFaceLab
DeepFaceLab is the leading software for creating deepfakes.
ERNIE
Official implementations for various pre-training models of ERNIE-family, covering topics of Language Understanding & Generation, Multimodal Understanding & Generation, and beyond.
MNN
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
muzic
Muzic: Music Understanding and Generation with Artificial Intelligence
ymir
YMIR, a streamlined model development product.
annotated_deep_learning_paper_implementations
🧑🏫 50! Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
FACIAL
FACIAL: Synthesizing Dynamic Talking Face With Implicit Attribute Learning. ICCV, 2021.
UTMOS22
UT-Sarulab MOS prediction system using SSL models
FlatTN
Chinese Text Normalization and Dataset
Muskits
An opensource music processing toolkit
FaceFormer
[CVPR 2022] FaceFormer: Speech-Driven 3D Facial Animation with Transformers
FastSpeech2
An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
book-text-to-speech
A book about Text-to-Speech (TTS) in Chinese.
DeepXi
Deep Xi: A deep learning approach to a priori SNR estimation implemented in TensorFlow 2/Keras. For speech enhancement and robust ASR.
NATSpeech
A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)
DiffSinger
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code
wekws
Production First and Production Ready End-to-End Keyword Spotting Toolkit
recasepunc
Model for recasing and repunctuating ASR transcripts
transformer-deploy
Efficient, scalable and enterprise-grade CPU/GPU inference server for Hugging Face transformer models 🚀
speech-synthesis-paper
List of speech synthesis papers.