zsc

Shuchang Zhou's starred repositories

ChatTTS

A generative speech model for daily dialogue.

Language:PythonAGPL-3.028343 169 417

Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Language:Jupyter NotebookApache-2.014324 115 375

Scrapegraph-ai

Python scraper based on AI

Language:PythonMIT13534 91 188

EmotiVoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

Language:PythonApache-2.06956 61 145

WhisperSpeech

An Open Source text-to-speech system built by inverting Whisper.

Language:Jupyter NotebookMIT3625 73 96

OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)

Language:PythonApache-2.01763 21 179

voice_datasets

🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).

1634 43 18

WhisperLive

A nearly-live implementation of OpenAI's Whisper.

Language:PythonMIT1607 29 158

chameleon

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Language:PythonNOASSERTION1597 24 46

ml-4m

4M: Massively Multimodal Masked Modeling

Language:PythonApache-2.01444 31 16

Memary

The Memory Layer For Autonomous Agents

Language:Jupyter NotebookMIT1173 13 28

soundstorm-pytorch

Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch

Language:PythonMIT1148 51 15

suno-api

Use API to call the music generation AI of suno.ai, and easily integrate it into agents like GPTs.

Language:TypeScriptLGPL-3.01007 30 105

PuLID

Official code for PuLID: Pure and Lightning ID Customization via Contrastive Alignment

Language:PythonApache-2.01005 38 46

improved-aesthetic-predictor

CLIP+MLP Aesthetic Score Predictor

Language:PythonApache-2.0812 6 10

AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications.

MIT408 12 1

AEC-Challenge

AEC Challenge

MIT361 29 23

CraftsMan

CraftsMan: High-fidelity Mesh Generation with 3D Native Diffusion and Interactive Geometry Refiner

Language:Python349 13 19

ScreenAI

Implementation of the ScreenAI model from the paper: "A Vision-Language Model for UI and Infographics Understanding"

Language:PythonMIT245 8 3

lightplane

Lightplane implements a highly memory-efficient differentiable radiance field renderer, and a module for unprojecting features from images to 3D grids.

Language:PythonNOASSERTION233 25 3