Steven Wang's starred repositories

torch-audiomentations

Fast audio data augmentation in PyTorch. Inspired by audiomentations. Useful for deep learning.

Language:PythonLicense:MITStargazers:909Issues:0Issues:0

torch-stft

An STFT/iSTFT for PyTorch.

Language:PythonLicense:BSD-3-ClauseStargazers:340Issues:0Issues:0

bark

🔊 Text-Prompted Generative Audio Model

Language:Jupyter NotebookLicense:MITStargazers:34091Issues:0Issues:0

SpeechT5

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing

Language:PythonLicense:MITStargazers:1121Issues:0Issues:0

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language:PythonLicense:Apache-2.0Stargazers:34132Issues:0Issues:0

CPM-Live

Live Training for Open-source Big Models

Language:PythonStargazers:511Issues:0Issues:0

nash-mtl

Official implementation of "Multi-Task Learning as a Bargaining Game" [ICML 2022]

Language:PythonStargazers:200Issues:0Issues:0

ffmpeg-python

Python bindings for FFmpeg - with complex filtering support

Language:PythonLicense:Apache-2.0Stargazers:9751Issues:0Issues:0

s3prl

Self-Supervised Speech Pre-training and Representation Learning Toolkit

Language:PythonLicense:Apache-2.0Stargazers:2183Issues:0Issues:0

webdataset

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

Language:PythonLicense:BSD-3-ClauseStargazers:2141Issues:0Issues:0

python_kaldi_features

python codes to extract MFCC and FBANK speech features for Kaldi

Language:PythonLicense:MITStargazers:62Issues:0Issues:0

python_speech_features

This library provides common speech features for ASR including MFCCs and filterbank energies.

Language:PythonLicense:MITStargazers:2352Issues:0Issues:0

RAM-multiprocess-dataloader

Demystify RAM Usage in Multi-Process Data Loaders

Language:PythonLicense:Apache-2.0Stargazers:169Issues:0Issues:0

Unciv

Open-source Android/Desktop remake of Civ V

Language:KotlinLicense:MPL-2.0Stargazers:8161Issues:0Issues:0

gradio

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

Language:PythonLicense:Apache-2.0Stargazers:31388Issues:0Issues:0

vits

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Language:PythonLicense:MITStargazers:6554Issues:0Issues:0

LazyVim

Neovim config for the lazy

Language:LuaLicense:Apache-2.0Stargazers:15996Issues:0Issues:0

Neovim-from-scratch

📚 A Neovim config designed from scratch to be understandable

Language:LuaLicense:GPL-3.0Stargazers:5334Issues:0Issues:0

tuning_playbook

A playbook for systematically maximizing the performance of deep learning models.

License:NOASSERTIONStargazers:26068Issues:0Issues:0

nvim-lua-guide-zh

https://github.com/nanotee/nvim-lua-guide chinese version

License:MITStargazers:1178Issues:0Issues:0

learn-neovim-lua

Neovim 配置实战:从 0 到 1 打造自己的 IDE

Language:LuaLicense:MITStargazers:1184Issues:0Issues:0

Lipreading_using_Temporal_Convolutional_Networks

ICASSP'22 Training Strategies for Improved Lip-Reading; ICASSP'21 Towards Practical Lipreading with Distilled and Efficient Models; ICASSP'20 Lipreading using Temporal Convolutional Networks

Language:PythonLicense:NOASSERTIONStargazers:374Issues:0Issues:0

pyroomacoustics

Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.

Language:PythonLicense:MITStargazers:1389Issues:0Issues:0

MISP2021-AVSR

repository for paper "Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis"

Language:ShellLicense:Apache-2.0Stargazers:15Issues:0Issues:0

aps

A personal toolkit for single/multi-channel speech recognition & enhancement & separation.

Language:PythonLicense:Apache-2.0Stargazers:136Issues:0Issues:0

pb_bss

Collection of EM algorithms for blind source separation of audio signals

Language:PythonLicense:MITStargazers:265Issues:0Issues:0

ThreadPool

A simple C++11 Thread Pool implementation

Language:C++License:ZlibStargazers:7738Issues:0Issues:0

cs-video-courses

List of Computer Science courses with video lectures.

Stargazers:66167Issues:0Issues:0

sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

Language:C++License:Apache-2.0Stargazers:9914Issues:0Issues:0

prml

Repository of notes, code and notebooks in Python for the book Pattern Recognition and Machine Learning by Christopher Bishop

Language:Jupyter NotebookLicense:AGPL-3.0Stargazers:2028Issues:0Issues:0