zsc

Shuchang Zhou's starred repositories

libriheavy

Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context

Language:PythonApache-2.016200

chameleon

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Language:PythonNOASSERTION159900

AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio applications.

MIT40900

voice_datasets

🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).

163400

ml-4m

4M: Massively Multimodal Masked Modeling

Language:PythonApache-2.0144400

Assemble-Them-All

[SIGGRAPH Asia 2022] Assemble Them All: Physics-Based Planning for Generalizable Assembly by Disassembly

Language:C++MIT13000

get-haized

A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.

7200

AEC-Challenge

AEC Challenge

MIT36100

TokenHMR

[CVPR 2024] TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation

Language:PythonNOASSERTION17900

CraftsMan

CraftsMan: High-fidelity Mesh Generation with 3D Native Diffusion and Interactive Geometry Refiner

Language:Python34900

improved-aesthetic-predictor

CLIP+MLP Aesthetic Score Predictor

Language:PythonApache-2.081600

megactor

Language:PythonApache-2.063000

ChatTTS

A generative speech model for daily dialogue.

Language:PythonAGPL-3.02835300

EmotiVoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

Language:PythonApache-2.0695800

suno-api

Use API to call the music generation AI of suno.ai, and easily integrate it into agents like GPTs.

Language:TypeScriptLGPL-3.0101700

myo_sim

Musculoskeletal Models in MuJoCo

Language:PythonApache-2.07100

ScreenAI

Implementation of the ScreenAI model from the paper: "A Vision-Language Model for UI and Infographics Understanding"

Language:PythonMIT24500

lightplane

Lightplane implements a highly memory-efficient differentiable radiance field renderer, and a module for unprojecting features from images to 3D grids.

Language:PythonNOASSERTION23500

PuLID

Official code for PuLID: Pure and Lightning ID Customization via Contrastive Alignment

Language:PythonApache-2.0100500

soundstorm-pytorch

Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch

Language:PythonMIT114800

Scrapegraph-ai

Python scraper based on AI

Language:PythonMIT1353700

Memary

The Memory Layer For Autonomous Agents

Language:Jupyter NotebookMIT117400

Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Language:Jupyter NotebookApache-2.01432500

demucs_batch-multigpu

[Batching/MultiGPU/DataLoader Implemented] Code for the paper Hybrid Spectrogram and Waveform Source Separation

Language:PythonMIT1800

emo-visual-data

😜 表情包视觉数据集，使用glm-4v、step-1v的图像解析能力标注。

8400

APISR

APISR: Anime Production Inspired Real-World Anime Super-Resolution (CVPR 2024)

Language:PythonGPL-3.077900

mujoco_menagerie

A collection of high-quality models for the MuJoCo physics engine, curated by Google DeepMind.

Language:Jupyter NotebookNOASSERTION117900

HPSv2

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

Language:Jupyter NotebookApache-2.033700