zexupan

Pan Zexu's starred repositories

TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Language:PythonMPL-2.032235 273 1067

TTS

:robot: :speech_balloon: Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)

Language:Jupyter NotebookMPL-2.09079 186 559

espnet

End-to-End Speech Processing Toolkit

Language:PythonApache-2.08162 179 2335

Conference-Acceptance-Rate

Acceptance rates for the major AI conferences

Language:Jupyter NotebookMIT4016 126 28

CVPR-2021-Papers

2540 67 21

FastSpeech2

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

Language:PythonMIT1715 27 211

ParallelWaveGAN

Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch

Language:Jupyter NotebookMIT1529 45 252

ast

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".

Language:Jupyter NotebookBSD-3-Clause1077 18 131

Awesome-CLIP

Awesome list for research on CLIP (Contrastive Language-Image Pre-Training).

1077 20 12

FastSpeech

The Implementation of FastSpeech based on pytorch.

Language:PythonMIT849 34 96

awesome-audio-visual

A curated list of different papers and datasets in various areas of audio-visual processing

636 17 2

Contrastive-Predictive-Coding-PyTorch

Contrastive Predictive Coding for Automatic Speaker Verification

Language:PythonMIT474 5 21

nara_wpe

Different implementations of "Weighted Prediction Error" for speech dereverberation

Language:PythonMIT469 18 37

pystoi

Python implementation of the Short Term Objective Intelligibility measure

Language:MATLABMIT316 12 19

PaSST

Efficient Training of Audio Transformers with Patchout

Language:PythonApache-2.0288 4 46

TalkNet-ASD

ACM MM 2021: 'Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection'

Language:PythonMIT282 7 66

Waveformer

A deep neural network architecture for low-latency audio processing

Language:PythonMIT277 6 4

dscore

Diarization scoring tools.

Language:PythonBSD-2-Clause205 8 4

speaker_extraction

target speaker extraction and verification for multi-talker speech

Language:PythonGPL-3.0153 8 5

youtube-gesture-dataset

This repository contains scripts to build Youtube Gesture Dataset.

Language:PythonBSD-3-Clause112 4 9

cocktail-fork-separation

Baseline multi-resolution cross network model trained using the Divide and Remaster Dataset

Language:PythonMIT71 4 2

AVA-AVD

Language:Python42 2 6

MuSE

Language:Python31 1 5

FlatTrajectoryDistillation_FTD

The code of the paper "Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation" (CVPR2023)

Language:Python1700

avse_hybrid_loss

Language:Python14 10

reentry

Language:Python14 1 1

USEV

Language:Python13 3 2

ImagineNET

Language:Python400

seg

Language:Python3 10

EE4208ComputerVision

Face Detection

Language:Python200