Steven Wang's starred repositories

SenseVoice

Multilingual Voice Understanding Model

Language:PythonLicense:NOASSERTIONStargazers:988Issues:0Issues:0

CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Language:PythonLicense:Apache-2.0Stargazers:1392Issues:0Issues:0

SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model

Language:PythonLicense:MITStargazers:402Issues:0Issues:0

silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Language:PythonLicense:MITStargazers:3378Issues:0Issues:0

pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Language:PythonLicense:NOASSERTIONStargazers:80265Issues:0Issues:0

speech-synthesis-paper

List of speech synthesis papers.

License:MITStargazers:969Issues:0Issues:0
Language:PythonStargazers:393Issues:0Issues:0

LoRA

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Language:PythonLicense:MITStargazers:9755Issues:0Issues:0

Whisper-Finetune

Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deployment, Windows desktop deployment, and Android deployment

Language:CLicense:Apache-2.0Stargazers:742Issues:0Issues:0

sanitizers

AddressSanitizer, ThreadSanitizer, MemorySanitizer

Language:CLicense:NOASSERTIONStargazers:11014Issues:0Issues:0

MMCSG

This repository contains the baseline system for CHiME-8 MMCSG challenge focusing on transcribing both sides of a conversation where one participant is wearing smart glasses equipped with a microphone array and camera.

Language:PythonLicense:NOASSERTIONStargazers:22Issues:0Issues:0

numpy_exercises

Numpy exercises.

Language:PythonLicense:MITStargazers:1695Issues:0Issues:0

RIR-Generator

Generating room impulse responses

Language:C++License:MITStargazers:409Issues:0Issues:0

faster-whisper

Faster Whisper transcription with CTranslate2

Language:PythonLicense:MITStargazers:10210Issues:0Issues:0

stable-ts

Transcription, forced alignment, and audio indexing with OpenAI's Whisper

Language:PythonLicense:MITStargazers:1404Issues:0Issues:0

jsalt2020_simulate

Training data simulation

Language:PythonLicense:Apache-2.0Stargazers:34Issues:0Issues:0

Beamforming-for-speech-enhancement

simple delaysum, MVDR and CGMM-MVDR

Language:PythonStargazers:218Issues:0Issues:0

makeMoE

From scratch implementation of a sparse mixture of experts language model inspired by Andrej Karpathy's makemore :)

Language:Jupyter NotebookLicense:MITStargazers:553Issues:0Issues:0

gss

A simple package for Guided source separation (GSS)

Language:PythonLicense:MITStargazers:98Issues:0Issues:0

unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Language:PythonLicense:MITStargazers:19141Issues:0Issues:0

flash-attention

Fast and memory-efficient exact attention

Language:PythonLicense:BSD-3-ClauseStargazers:11958Issues:0Issues:0

machine-learning-roadmap

A roadmap connecting many of the most important concepts in machine learning, how to learn them and what tools to use to perform them.

License:MITStargazers:7387Issues:0Issues:0

Modern-CPP-Programming

Modern C++ Programming Course (C++03/11/14/17/20/23/26)

Language:HTMLStargazers:11435Issues:0Issues:0

GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Language:PythonLicense:MITStargazers:28955Issues:0Issues:0

ICASSP-2023-24-Papers

ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!

Language:PythonLicense:MITStargazers:283Issues:0Issues:0

audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

Language:PythonLicense:MITStargazers:20197Issues:0Issues:0

e2e_lfmmi

E2E system with LF-MMI; word N-gram for Mandarin

Language:PythonStargazers:160Issues:0Issues:0

kaldifst

Python wrapper for OpenFST and its extensions from Kaldi. Also support reading/writing ark/scp files

Language:C++License:NOASSERTIONStargazers:47Issues:0Issues:0

BeamformIt

BeamformIt acoustic beamforming software

Language:C++Stargazers:336Issues:0Issues:0

NotepadNext

A cross-platform, reimplementation of Notepad++

Language:C++License:GPL-3.0Stargazers:8704Issues:0Issues:0