zszheng147

Zhisheng Zheng's starred repositories

BigVGAN

Official PyTorch implementation of BigVGAN (ICLR 2023)

Language:PythonMIT70900

torchtune

A Native-PyTorch Library for LLM Fine-tuning

Language:PythonBSD-3-Clause358100

CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Language:PythonApache-2.0151000

EmoBox

[INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark

Language:Python8500

fast_clip

Language:PythonMIT1300

open_clip

An open source implementation of CLIP.

Language:PythonNOASSERTION916300

OpenVoice

Instant voice cloning by MyShell.

Language:PythonMIT2721000

flash-attention

Fast and memory-efficient exact attention

Language:PythonBSD-3-Clause1196800

GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Language:PythonMIT2897400

fish-speech

Brand new TTS solution

Language:PythonNOASSERTION507400

vector-quantize-pytorch

Vector (and Scalar) Quantization, in Pytorch

Language:PythonMIT216400

Spatial-AST

🦇 Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)

Language:PythonNOASSERTION2300

build-nanogpt

Video+code lecture on building nanoGPT from scratch

Language:Python293900

GigaSpeech2

An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement

Language:PythonApache-2.07600

ears_dataset

Expressive Anechoic Recordings of Speech (EARS)

Language:PythonNOASSERTION9600

AnimateDiff

Official implementation of AnimateDiff.

Language:PythonApache-2.0978900

Omost

Your image is almost there!

Language:PythonApache-2.0688300

tango

A family of diffusion models for text-to-audio generation.

Language:PythonNOASSERTION95300

ChatTTS

A generative speech model for daily dialogue.

Language:PythonNOASSERTION2745700

icefall

Language:PythonApache-2.0200

x-transformers

A simple but complete full-attention transformer with a set of promising experimental features from various papers

Language:PythonMIT437000

vocos

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Language:PythonMIT70100

keops

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows

Language:PythonMIT102600

SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model

Language:PythonMIT40200

seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

Language:Jupyter NotebookNOASSERTION1053800

encodec

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

Language:PythonMIT330300

descript-audio-codec

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Language:PythonMIT101800

apt-local-install

Tool for installing apt packages without root permission in user local space (aptli).

Language:Python2200

pykan

Kolmogorov Arnold Networks

Language:Jupyter NotebookMIT1372000

stable-audio-tools

Generative models for conditional audio generation

Language:PythonMIT228300