Steven Wang's starred repositories
SenseVoice
Multilingual Voice Understanding Model
silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
speech-synthesis-paper
List of speech synthesis papers.
Whisper-Finetune
Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deployment, Windows desktop deployment, and Android deployment
sanitizers
AddressSanitizer, ThreadSanitizer, MemorySanitizer
numpy_exercises
Numpy exercises.
RIR-Generator
Generating room impulse responses
faster-whisper
Faster Whisper transcription with CTranslate2
jsalt2020_simulate
Training data simulation
Beamforming-for-speech-enhancement
simple delaysum, MVDR and CGMM-MVDR
flash-attention
Fast and memory-efficient exact attention
machine-learning-roadmap
A roadmap connecting many of the most important concepts in machine learning, how to learn them and what tools to use to perform them.
Modern-CPP-Programming
Modern C++ Programming Course (C++03/11/14/17/20/23/26)
GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
ICASSP-2023-24-Papers
ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!
audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
BeamformIt
BeamformIt acoustic beamforming software
NotepadNext
A cross-platform, reimplementation of Notepad++