Mortyzhou-Shef-BIT

A curated list of research in System for Edge Intelligence and Computing(Edge MLSys), including Frameworks, Tools, Repository, etc. Paper notes are also provided.

MIT000

CMU-MultimodalSDK

CMU MultimodalSDK is a machine learning platform for development of advanced multimodal models as well as easily accessing and processing multimodal datasets.

Language:PythonNOASSERTION000

crank

A toolkit for non-parallel voice conversion based on vector-quantized variational autoencoder

Language:PythonMIT000

dialog_evaluation_paper_list

Dialog Evaluation Paper List: include multiple different dialog tasks

000

diffwave

DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.

Language:PythonApache-2.0000

espnet_model_zoo

ESPnet Model Zoo

Language:PythonApache-2.0000

fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Language:PythonMIT000

FastVocoder

Include Basis-MelGAN, MelGAN, HifiGAN and Multiband-HifiGAN, maybe NHV in the future.

Language:PythonMIT000

gdown

Download a large file from Google Drive (curl/wget fails because of the security notice).

Language:PythonMIT000

HiSD

Official pytorch implementation of paper "Image-to-image Translation via Hierarchical Style Disentanglement" (CVPR 2021 Oral).

Language:PythonNOASSERTION000

Pytorch-MBNet

A pytorch implementation of MBNET: MOS PREDICTION FOR SYNTHESIZED SPEECH WITH MEAN-BIAS NETWORK

Language:Python000

reentry

Language:Python000

s3prl

Self-Supervised Speech Pre-training and Representation Learning Toolkit.

Language:PythonMIT000

speechmetrics

A wrapper around speech quality metrics MOSNet, BSSEval, STOI, PESQ, SRMR, SISDR

Language:Python000

SpeechTransProgress

Tracking the progress in end-to-end speech translation

CC0-1.0000

StarGANv2-VC

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

Language:PythonMIT000

Talking-Face_PC-AVS

Code for Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021)

Language:PythonCC-BY-4.0000

TalkNet-ASD

ACM MM 2021: 'Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection'

Language:PythonMIT000

tango

Codes and Model of the paper "Text-to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model"

Language:PythonNOASSERTION000

transformers

🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

Language:PythonApache-2.0000

TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Language:PythonMPL-2.0000

VQMIVC

Official implementation of VQMIVC: One-shot Voice Conversion @ Interspeech 2021

Language:PythonMIT000