Beast code in Giters

蓋瑞王's repositories

SALMONN

SALMONN: Speech Audio Language Music Open Neural Network

Apache-2.0000

LLM-Agent-Paper-List

The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.

000

AudioGPT

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

NOASSERTION000

ESP32-targz

🗜️ An Arduino library to unpack/uncompress tar, gz, and tar.gz files on ESP32 and ESP8266

NOASSERTION000

AniPortrait

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

Apache-2.0000

MU-LLaMA

MU-LLaMA: Music Understanding Large Language Model

GPL-3.0000

ChatDev

Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration)

Apache-2.0000

StreamMultiDiffusion

Official code for the paper "StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control."

MIT000

OOTDiffusion

Official implementation of OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

NOASSERTION000

ATLAS

A principled instruction benchmark on formulating effective queries and prompts for large language models (LLMs). Our paper: https://arxiv.org/abs/2312.16171

Apache-2.0000

distrifuser

[CVPR 2024] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

MIT000

Prompt-Engineering-Guide

🐙 Guides, papers, lecture, notebooks and resources for prompt engineering

MIT000

Seeing-and-Hearing

[CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

000

SoraReview

The official GitHub page for the review paper "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models".

000

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

NOASSERTION000

yolov9

Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

Language:PythonGPL-3.0000

ML-Papers-of-the-Week

🔥Highlighting the top ML papers every week.

000

agentscope

AgentScope: A Flexible yet Robust Multi-Agent Platform

Apache-2.0000

subobjects

Official repository of paper "Subobject-level Image Tokenization"

000

FiT

FiT: Flexible Vision Transformer for Diffusion Model

Apache-2.0000

DataDreamer

DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤

MIT000

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

MIT000

audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

Language:PythonMIT000

MERT

Official implementation of the paper "Acoustic Music Understanding Model with Large-Scale Self-supervised Training".

Language:PythonApache-2.0000

gary109

蓋瑞王's repositories

SALMONN

LLM-Agent-Paper-List

GenAI_hw6_dataset

AudioGPT

ESP32-targz

AniPortrait

GenAI-Hw5

MU-LLaMA

ChatDev

songcomposer

StreamMultiDiffusion

OOTDiffusion

AnyTool

ATLAS

distrifuser

Prompt-Engineering-Guide

Seeing-and-Hearing

EMO

SoraReview

DiT

yolov9

ML-Papers-of-the-Week

agentscope

subobjects

FiT

DataDreamer

c3po

Amphion

audiocraft

MERT