Jing-Xuan Zhang's starred repositories
ctc_segmentation
Segment a given audio into utterances using a trained end-to-end ASR model.
AV-RelScore
Audio-Visual Corruption Modeling of our paper "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring" in CVPR23
Visual_Speech_Recognition_for_Multiple_Languages
Visual Speech Recognition for Multiple Languages
Semi-supervised-learning
A Unified Semi-Supervised Learning Codebase (NeurIPS'22)
PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
nonparaSeq2seqVC_code
Implementation code of non-parallel sequence-to-sequence VC
Real-Time-Voice-Cloning
Clone a voice in 5 seconds to generate arbitrary speech in real-time
ParallelWaveGAN
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch
ultrasuite-tools
Tools to process the UltraSuite data
LipNet-PyTorch
The state-of-art PyTorch implementation of the method described in the paper "LipNet: End-to-End Sentence-level Lipreading" (https://arxiv.org/abs/1611.01599)
cluster-scripts
A collection of useful scripts, templates, and examples for clusters using SLURM https://slurm.schedmd.com/