yhzhouowo's repositories
Awesome-Transformer-Attention
An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites
Speech-Resources
语音方向实验室/公司/资源/实习等,欢迎推荐或自荐
AudioLDM
AudioLDM: Generate speech, sound effects, music and beyond, with text.
awesome-embodied-vision
Reading list for research topics in embodied vision
Awesome-Multimodal-Research
A curated list of Multimodal Related Research.
DYGANVC
source code for "DYGAN-VC: IMPROVING SPEECH CONTENT PRESERVATION FOR GAN VOICE CONVERSION USING DYNAMIC CONVOLUTION"
speech-synthesis-paper
List of speech synthesis papers.
Awesome-Cloud-Edge-AI
A curated list of research in System for Edge Intelligence and Computing(Edge MLSys), including Frameworks, Tools, Repository, etc. Paper notes are also provided.
CMU-MultimodalSDK
CMU MultimodalSDK is a machine learning platform for development of advanced multimodal models as well as easily accessing and processing multimodal datasets.
crank
A toolkit for non-parallel voice conversion based on vector-quantized variational autoencoder
dialog_evaluation_paper_list
Dialog Evaluation Paper List: include multiple different dialog tasks
diffwave
DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.
espnet_model_zoo
ESPnet Model Zoo
fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
FastVocoder
Include Basis-MelGAN, MelGAN, HifiGAN and Multiband-HifiGAN, maybe NHV in the future.
gdown
Download a large file from Google Drive (curl/wget fails because of the security notice).
HiSD
Official pytorch implementation of paper "Image-to-image Translation via Hierarchical Style Disentanglement" (CVPR 2021 Oral).
Pytorch-MBNet
A pytorch implementation of MBNET: MOS PREDICTION FOR SYNTHESIZED SPEECH WITH MEAN-BIAS NETWORK
s3prl
Self-Supervised Speech Pre-training and Representation Learning Toolkit.
speechmetrics
A wrapper around speech quality metrics MOSNet, BSSEval, STOI, PESQ, SRMR, SISDR
SpeechTransProgress
Tracking the progress in end-to-end speech translation
StarGANv2-VC
StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion
Talking-Face_PC-AVS
Code for Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021)
TalkNet-ASD
ACM MM 2021: 'Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection'
tango
Codes and Model of the paper "Text-to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model"
transformers
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
TTS
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
VQMIVC
Official implementation of VQMIVC: One-shot Voice Conversion @ Interspeech 2021