Beast code in Giters

macroustc's repositories

silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector, Language Classifier and Spoken Number Detector

MIT000

natural-speech-pytorch

Implementation of the neural network proposed in Natural Speech, a text-to-speech generator that is indistinguishable from human recordings for the first time, from Microsoft Research

MIT000

Easy-to-use and powerful NLP library with Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including Neural Search, Question Answering, Information Extraction and Sentiment Analysis end-to-end system.

Apache-2.0000

DeepFaceLab

DeepFaceLab is the leading software for creating deepfakes.

GPL-3.0000

ERNIE

Official implementations for various pre-training models of ERNIE-family, covering topics of Language Understanding & Generation, Multimodal Understanding & Generation, and beyond.

000

MNN

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba

000

NeuralSpeech

000

muzic

Muzic: Music Understanding and Generation with Artificial Intelligence

MIT000

ymir

YMIR, a streamlined model development product.

Apache-2.0000

K-Adapter

MIT000

annotated_deep_learning_paper_implementations

🧑‍🏫 50! Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠

MIT000

FACIAL

FACIAL: Synthesizing Dynamic Talking Face With Implicit Attribute Learning. ICCV, 2021.

AGPL-3.0000

Automatic-Prosody-Annotation

000

UTMOS22

UT-Sarulab MOS prediction system using SSL models

MIT000

FlatTN

Chinese Text Normalization and Dataset

000

SpanPSP

000

Muskits

An opensource music processing toolkit

Apache-2.0000

FaceFormer

[CVPR 2022] FaceFormer: Speech-Driven 3D Facial Animation with Transformers

MIT000

FastSpeech2

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

MIT000

book-text-to-speech

A book about Text-to-Speech (TTS) in Chinese.

NOASSERTION000

CUCVAE-TTS

MIT000

DeepXi

Deep Xi: A deep learning approach to a priori SNR estimation implemented in TensorFlow 2/Keras. For speech enhancement and robust ASR.

MPL-2.0000

NATSpeech

A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)

MIT000

DiffSinger

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code

MIT000

wekws

Production First and Production Ready End-to-End Keyword Spotting Toolkit

Apache-2.0000

recasepunc

Model for recasing and repunctuating ASR transcripts

BSD-3-Clause000

transformer-deploy

Efficient, scalable and enterprise-grade CPU/GPU inference server for Hugging Face transformer models 🚀

Apache-2.0000

speech-synthesis-paper

List of speech synthesis papers.

MIT000

macroustc

macroustc's repositories

faceswap

VTuberTalk

silero-vad

natural-speech-pytorch

PaddleNLP

DeepFaceLab

ERNIE

MNN

NeuralSpeech

muzic

ymir

K-Adapter

annotated_deep_learning_paper_implementations

FACIAL

Automatic-Prosody-Annotation

UTMOS22

FlatTN

SpanPSP

Muskits

FaceFormer

FastSpeech2

book-text-to-speech

CUCVAE-TTS

DeepXi

NATSpeech

DiffSinger

wekws

recasepunc

transformer-deploy

speech-synthesis-paper