KiAlexander's starred repositories
google-research
Google Research
speechbrain
A PyTorch-based Speech Toolkit
lip-reading-deeplearning
:unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures
denoiser
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.
packnet-sfm
TRI-ML Monocular Depth Estimation Repository
Res2Net-PretrainedModels
(ImageNet pretrained models) The official pytorch implemention of the TPAMI paper "Res2Net: A New Multi-scale Backbone Architecture"
transformer
Implementation of Transformer model (originally from Attention is All You Need) applied to Time Series.
Speech-Transformer
A PyTorch implementation of Speech Transformer, an End-to-End ASR with Transformer network on Mandarin Chinese.
transformer
A Pytorch Implementation of "Attention is All You Need" and "Weighted Transformer Network for Machine Translation"
Lipreading_using_Temporal_Convolutional_Networks
ICASSP'22 Training Strategies for Improved Lip-Reading; ICASSP'21 Towards Practical Lipreading with Distilled and Efficient Models; ICASSP'20 Lipreading using Temporal Convolutional Networks
torchsummaryX
torchsummaryX: Improved visualization tool of torchsummary
Multi-Scale-1D-ResNet
pytorch code of multi scale 1d resnet, we hope it will help your research
ConferencingSpeech2021
Conferencing Speech Challenge
active-speakers-context
Code for the Active Speakers in Context Paper (CVPR2020)
pytorch_complex
A temporal module for PyTorch-ComplexTensor
biased_separation
Code for the paper: Unified Gradient Reweighting for Model Biasing with Applications to Source Separation