蓋瑞王's repositories
SALMONN
SALMONN: Speech Audio Language Music Open Neural Network
LLM-Agent-Paper-List
The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.
AudioGPT
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
ESP32-targz
🗜️ An Arduino library to unpack/uncompress tar, gz, and tar.gz files on ESP32 and ESP8266
AniPortrait
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
GenAI-Hw5
repo of Introduction to GenAI Hw5
MU-LLaMA
MU-LLaMA: Music Understanding Large Language Model
ChatDev
Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration)
StreamMultiDiffusion
Official code for the paper "StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control."
OOTDiffusion
Official implementation of OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
ATLAS
A principled instruction benchmark on formulating effective queries and prompts for large language models (LLMs). Our paper: https://arxiv.org/abs/2312.16171
distrifuser
[CVPR 2024] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Prompt-Engineering-Guide
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
Seeing-and-Hearing
[CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
SoraReview
The official GitHub page for the review paper "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models".
DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
yolov9
Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
ML-Papers-of-the-Week
🔥Highlighting the top ML papers every week.
agentscope
AgentScope: A Flexible yet Robust Multi-Agent Platform
subobjects
Official repository of paper "Subobject-level Image Tokenization"
FiT
FiT: Flexible Vision Transformer for Diffusion Model
DataDreamer
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤
Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
MERT
Official implementation of the paper "Acoustic Music Understanding Model with Large-Scale Self-supervised Training".