Tanel Alumäe's starred repositories
pytorch-image-models
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNet-V3/V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
vit-pytorch
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
speechbrain
A PyTorch-based Speech Toolkit
silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
asv-subtools
An Open Source Tools for Speaker Recognition
whisper-finetuning
[WIP] Scripts for fine-tuning Whisper
sepia-stt-server
SEPIA server to support open-source speech recognition via WebSocket connection.
kaldi-model-server
Simple Kaldi model server for chain (nnet3) models in online recognition mode directly from a local microphone
bbb-live-subtitles
BBB plugin for automatic subtitles in conference calls
Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
tts_preprocess_et
Estonian text-to-speech transliteration pipeline
build-pynini-wheels
Build `manylinux2014_x86_64` Python wheels for `pynini`, wrapping all its dependencies. This is a ServiceNow Research project that was started at Element AI.