Zhisheng Zheng (zszheng147)

zszheng147

Geek Repo

Company:The University of Texas at Austin

Location:Austin, TX

Home Page:https://zhishengzheng.com/

Github PK Tool:Github PK Tool

Zhisheng Zheng's starred repositories

BigVGAN

Official PyTorch implementation of BigVGAN (ICLR 2023)

Language:PythonLicense:MITStargazers:709Issues:0Issues:0

torchtune

A Native-PyTorch Library for LLM Fine-tuning

Language:PythonLicense:BSD-3-ClauseStargazers:3581Issues:0Issues:0

CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Language:PythonLicense:Apache-2.0Stargazers:1510Issues:0Issues:0

EmoBox

[INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark

Language:PythonStargazers:85Issues:0Issues:0
Language:PythonLicense:MITStargazers:13Issues:0Issues:0

open_clip

An open source implementation of CLIP.

Language:PythonLicense:NOASSERTIONStargazers:9163Issues:0Issues:0

OpenVoice

Instant voice cloning by MyShell.

Language:PythonLicense:MITStargazers:27210Issues:0Issues:0

flash-attention

Fast and memory-efficient exact attention

Language:PythonLicense:BSD-3-ClauseStargazers:11968Issues:0Issues:0

GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Language:PythonLicense:MITStargazers:28974Issues:0Issues:0

fish-speech

Brand new TTS solution

Language:PythonLicense:NOASSERTIONStargazers:5074Issues:0Issues:0

vector-quantize-pytorch

Vector (and Scalar) Quantization, in Pytorch

Language:PythonLicense:MITStargazers:2164Issues:0Issues:0

Spatial-AST

🦇 Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)

Language:PythonLicense:NOASSERTIONStargazers:23Issues:0Issues:0

build-nanogpt

Video+code lecture on building nanoGPT from scratch

Language:PythonStargazers:2939Issues:0Issues:0

GigaSpeech2

An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement

Language:PythonLicense:Apache-2.0Stargazers:76Issues:0Issues:0

ears_dataset

Expressive Anechoic Recordings of Speech (EARS)

Language:PythonLicense:NOASSERTIONStargazers:96Issues:0Issues:0

AnimateDiff

Official implementation of AnimateDiff.

Language:PythonLicense:Apache-2.0Stargazers:9789Issues:0Issues:0

Omost

Your image is almost there!

Language:PythonLicense:Apache-2.0Stargazers:6883Issues:0Issues:0

tango

A family of diffusion models for text-to-audio generation.

Language:PythonLicense:NOASSERTIONStargazers:953Issues:0Issues:0

ChatTTS

A generative speech model for daily dialogue.

Language:PythonLicense:NOASSERTIONStargazers:27457Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:2Issues:0Issues:0

x-transformers

A simple but complete full-attention transformer with a set of promising experimental features from various papers

Language:PythonLicense:MITStargazers:4370Issues:0Issues:0

vocos

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Language:PythonLicense:MITStargazers:701Issues:0Issues:0

keops

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows

Language:PythonLicense:MITStargazers:1026Issues:0Issues:0

SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model

Language:PythonLicense:MITStargazers:402Issues:0Issues:0

seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:10538Issues:0Issues:0

encodec

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

Language:PythonLicense:MITStargazers:3303Issues:0Issues:0

descript-audio-codec

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Language:PythonLicense:MITStargazers:1018Issues:0Issues:0

apt-local-install

Tool for installing apt packages without root permission in user local space (aptli).

Language:PythonStargazers:22Issues:0Issues:0

pykan

Kolmogorov Arnold Networks

Language:Jupyter NotebookLicense:MITStargazers:13720Issues:0Issues:0

stable-audio-tools

Generative models for conditional audio generation

Language:PythonLicense:MITStargazers:2283Issues:0Issues:0