Zhisheng Zheng's starred repositories
flash-attention
Fast and memory-efficient exact attention
GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
fish-speech
Brand new TTS solution
vector-quantize-pytorch
Vector (and Scalar) Quantization, in Pytorch
Spatial-AST
🦇 Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)
build-nanogpt
Video+code lecture on building nanoGPT from scratch
GigaSpeech2
An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement
ears_dataset
Expressive Anechoic Recordings of Speech (EARS)
AnimateDiff
Official implementation of AnimateDiff.
x-transformers
A simple but complete full-attention transformer with a set of promising experimental features from various papers
seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
descript-audio-codec
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
apt-local-install
Tool for installing apt packages without root permission in user local space (aptli).
stable-audio-tools
Generative models for conditional audio generation