Steven Wang's starred repositories
torch-audiomentations
Fast audio data augmentation in PyTorch. Inspired by audiomentations. Useful for deep learning.
torch-stft
An STFT/iSTFT for PyTorch.
ffmpeg-python
Python bindings for FFmpeg - with complex filtering support
webdataset
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
python_kaldi_features
python codes to extract MFCC and FBANK speech features for Kaldi
python_speech_features
This library provides common speech features for ASR including MFCCs and filterbank energies.
RAM-multiprocess-dataloader
Demystify RAM Usage in Multi-Process Data Loaders
Neovim-from-scratch
📚 A Neovim config designed from scratch to be understandable
tuning_playbook
A playbook for systematically maximizing the performance of deep learning models.
nvim-lua-guide-zh
https://github.com/nanotee/nvim-lua-guide chinese version
learn-neovim-lua
Neovim 配置实战:从 0 到 1 打造自己的 IDE
Lipreading_using_Temporal_Convolutional_Networks
ICASSP'22 Training Strategies for Improved Lip-Reading; ICASSP'21 Towards Practical Lipreading with Distilled and Efficient Models; ICASSP'20 Lipreading using Temporal Convolutional Networks
pyroomacoustics
Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.
MISP2021-AVSR
repository for paper "Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis"
ThreadPool
A simple C++11 Thread Pool implementation
cs-video-courses
List of Computer Science courses with video lectures.
sentencepiece
Unsupervised text tokenizer for Neural Network-based text generation.