wxy1988

followers

following

stars

Steven Wang's starred repositories

torch-audiomentations

Fast audio data augmentation in PyTorch. Inspired by audiomentations. Useful for deep learning.

Language:PythonMIT90900

torch-stft

An STFT/iSTFT for PyTorch.

Language:PythonBSD-3-Clause34000

bark

🔊 Text-Prompted Generative Audio Model

Language:Jupyter NotebookMIT3409100

SpeechT5

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing

Language:PythonMIT112100

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language:PythonApache-2.03413200

CPM-Live

Live Training for Open-source Big Models

Language:Python51100

nash-mtl

Official implementation of "Multi-Task Learning as a Bargaining Game" [ICML 2022]

Language:Python20000

ffmpeg-python

Python bindings for FFmpeg - with complex filtering support

Language:PythonApache-2.0975100

s3prl

Self-Supervised Speech Pre-training and Representation Learning Toolkit

Language:PythonApache-2.0218300

webdataset

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

Language:PythonBSD-3-Clause214100

python_kaldi_features

python codes to extract MFCC and FBANK speech features for Kaldi

Language:PythonMIT6200

python_speech_features

This library provides common speech features for ASR including MFCCs and filterbank energies.

Language:PythonMIT235200

RAM-multiprocess-dataloader

Demystify RAM Usage in Multi-Process Data Loaders

Language:PythonApache-2.016900

Unciv

Open-source Android/Desktop remake of Civ V

Language:KotlinMPL-2.0816100

gradio

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

Language:PythonApache-2.03138800

vits

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Language:PythonMIT655400

LazyVim

Neovim config for the lazy

Language:LuaApache-2.01599600

Neovim-from-scratch

📚 A Neovim config designed from scratch to be understandable

Language:LuaGPL-3.0533400

tuning_playbook

A playbook for systematically maximizing the performance of deep learning models.

NOASSERTION2606800

nvim-lua-guide-zh

https://github.com/nanotee/nvim-lua-guide chinese version

MIT117800

learn-neovim-lua

Neovim 配置实战：从 0 到 1 打造自己的 IDE

Language:LuaMIT118400

Lipreading_using_Temporal_Convolutional_Networks

ICASSP'22 Training Strategies for Improved Lip-Reading; ICASSP'21 Towards Practical Lipreading with Distilled and Efficient Models; ICASSP'20 Lipreading using Temporal Convolutional Networks

Language:PythonNOASSERTION37400

pyroomacoustics

Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.

Language:PythonMIT138900

MISP2021-AVSR

repository for paper "Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis"

Language:ShellApache-2.01500

aps

A personal toolkit for single/multi-channel speech recognition & enhancement & separation.

Language:PythonApache-2.013600

pb_bss

Collection of EM algorithms for blind source separation of audio signals

Language:PythonMIT26500

ThreadPool

A simple C++11 Thread Pool implementation

Language:C++Zlib773800

cs-video-courses

List of Computer Science courses with video lectures.

sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

Language:C++Apache-2.0991400

prml

Repository of notes, code and notebooks in Python for the book Pattern Recognition and Machine Learning by Christopher Bishop

Language:Jupyter NotebookAGPL-3.0202800