thangdepzai's repositories
Awesome-AutoDL
A curated list of automated deep learning (including neural architecture search and hyper-parameter optimization) resources.
optimized_transducer
Memory efficient transducer loss computation
pytorch-image-models
PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, EfficientNetV2, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, RegNet, DPN, CSPNet, and more
3m-asr
3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition
ASR-proto
Implemintetion of linear attention conformer - LAC
awesome-AutoML
Curating a list of AutoML-related research, tools, projects and other resources
ConferencingSpeech2022
Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge in Online Conferencing Applications
conformer
PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)
conformer_ocr
Transformer OCR is a Optical Character Recognition tookit built for researchers working on both OCR for both Vietnamese and English. This project only focused on variants of vanilla Transformer (Conformer) and Feature Extraction (CNN-based approach).
Cream
This is a collection of our NAS and Vision Transformer work.
dont-stop-pretraining
Code associated with the Don't Stop Pretraining ACL 2020 paper
ECAPA-TDNN-1
Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)
FT-w2v2-ser
Official implementation for the paper Exploring Wav2vec 2.0 fine-tuning for improved speech emotion recognition
hi_kia
wake-up word emotion recognition [APSIPA 2022]
Loss-Gated-Learning
ICASSP 2022: 'Self-supervised Speaker Recognition with Loss-gated Learning'
SASVC
Spoofing-Aware Speaker Verification
Speaker-VGG-CCT
Official implementation of the paper "SPEAKER VGG CCT: Cross-corpus Speech Emotion Recognition with Speaker Embedding and Vision Transformers, 2022"
SpeakerProfiling
Estimating the Age, Height, and Gender of a speaker with their speech signal.
sugar
Efficient Speech Processing Tookit for Automatic Speaker Recognition
UHV-OTS-Speech
A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.
wespeaker
Production First and Production Ready Speaker Recognition Toolkit
Wrapper-Filter-Speech-Emotion-Recognition
Implementation of our paper "A Hybrid Deep Feature Selection Framework for Emotion Recognition from Human Speeches" [Multimedia Tools and Applications, Springer]