蓋瑞王's repositories

SALMONN

SALMONN: Speech Audio Language Music Open Neural Network

License:Apache-2.0Stargazers:0Issues:0Issues:0

LLM-Agent-Paper-List

The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.

Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

AudioGPT

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

License:NOASSERTIONStargazers:0Issues:0Issues:0

ESP32-targz

🗜️ An Arduino library to unpack/uncompress tar, gz, and tar.gz files on ESP32 and ESP8266

License:NOASSERTIONStargazers:0Issues:0Issues:0

AniPortrait

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

License:Apache-2.0Stargazers:0Issues:0Issues:0

GenAI-Hw5

repo of Introduction to GenAI Hw5

Stargazers:0Issues:0Issues:0

MU-LLaMA

MU-LLaMA: Music Understanding Large Language Model

License:GPL-3.0Stargazers:0Issues:0Issues:0

ChatDev

Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration)

License:Apache-2.0Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

StreamMultiDiffusion

Official code for the paper "StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control."

License:MITStargazers:0Issues:0Issues:0

OOTDiffusion

Official implementation of OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

License:NOASSERTIONStargazers:0Issues:0Issues:0
License:Apache-2.0Stargazers:0Issues:0Issues:0

ATLAS

A principled instruction benchmark on formulating effective queries and prompts for large language models (LLMs). Our paper: https://arxiv.org/abs/2312.16171

License:Apache-2.0Stargazers:0Issues:0Issues:0

distrifuser

[CVPR 2024] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

License:MITStargazers:0Issues:0Issues:0

Prompt-Engineering-Guide

🐙 Guides, papers, lecture, notebooks and resources for prompt engineering

License:MITStargazers:0Issues:0Issues:0

Seeing-and-Hearing

[CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

SoraReview

The official GitHub page for the review paper "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models".

Stargazers:0Issues:0Issues:0

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

License:NOASSERTIONStargazers:0Issues:0Issues:0

yolov9

Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

Language:PythonLicense:GPL-3.0Stargazers:0Issues:0Issues:0

ML-Papers-of-the-Week

🔥Highlighting the top ML papers every week.

Stargazers:0Issues:0Issues:0

agentscope

AgentScope: A Flexible yet Robust Multi-Agent Platform

License:Apache-2.0Stargazers:0Issues:0Issues:0

subobjects

Official repository of paper "Subobject-level Image Tokenization"

Stargazers:0Issues:0Issues:0

FiT

FiT: Flexible Vision Transformer for Diffusion Model

License:Apache-2.0Stargazers:0Issues:0Issues:0

DataDreamer

DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models.   🤖💤

License:MITStargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

License:MITStargazers:0Issues:0Issues:0

audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

MERT

Official implementation of the paper "Acoustic Music Understanding Model with Large-Scale Self-supervised Training".

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0