gdy1201's repositories
ACNet
ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks
annotated_deep_learning_paper_implementations
🧑🏫 50! Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
awesome-asr-contextualization
A curated list of awesome papers on contextualizing E2E ASR outputs
awesome-diarization
A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.
Awesome-pytorch-list
A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.
Awesome-Visual-Transformer
Collect some papers about transformer with vision. Awesome Transformer with Computer Vision (CV)
denoiser
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.
end2end-asr-pytorch
End-to-End Automatic Speech Recognition on PyTorch
External-Attention-pytorch
🍀 Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.⭐⭐⭐
LAS_Mandarin_PyTorch
Listen, attend and spell Model and a Chinese Mandarin Pretrained model (中文-普通话 ASR模型)
MQRNN
Multi-Quantile Recurrent Neural Network for Quantile Regression
nlp-tutorial
Natural Language Processing Tutorial for Deep Learning Researchers
ParallelWaveGAN
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN) with Pytorch
pointer_summarizer
pytorch implementation of "Get To The Point: Summarization with Pointer-Generator Networks"
pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, speaker embedding
pytorch_xvectors
Deep speaker embeddings in PyTorch, including x-vectors. Code used in this work: https://arxiv.org/abs/2007.16196
segan_pytorch
Speech Enhancement Generative Adversarial Network in PyTorch
seq2seq
Minimal Seq2Seq model with Attention for Neural Machine Translation in PyTorch
slot_filling_and_intent_detection_of_SLU
slot filling, intent detection, joint training, ATIS & SNIPS datasets, the Facebook’s multilingual dataset, MIT corpus, E-commerce Shopping Assistant (ECSA) dataset, CoNLL2003 NER, ELMo, BERT, XLNet
SpecAugment
A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain
SpectralCluster
Python re-implementation of the spectral clustering algorithm in the paper "Speaker Diarization with LSTM"
Time-Series-Library
A Library for Advanced Deep Time Series Models.
VGG-Speaker-Recognition
Utterance-level Aggregation For Speaker Recognition In The Wild
voice_datasets
🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
wav2vec
a simplified version of wav2vec(1.0, vq, 2.0) in fairseq