IrohXu

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language:Jupyter NotebookApache-2.01041900

GPT4Tools

GPT4Tools is an intelligent system that can automatically decide, control, and utilize different visual foundation models, allowing the user to interact with images during a conversation.

Language:PythonNOASSERTION75100

Multi-Modality-Arena

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

Language:Python43500

stable-diffusion-webui

Stable Diffusion web UI

Language:PythonAGPL-3.013912900

PerlDiff

PerlDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models

3200

LLaMA-Factory

Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)

Language:PythonApache-2.03025300

EB1A

EB1A Full Application - I-140 and I-485

Language:TeX21200

DDPM_inversion

Official pytorch implementation of the paper: "An Edit Friendly DDPM Noise Space: Inversion and Manipulations". CVPR 2024.

Language:PythonMIT24300

HiDiffusion

[ECCV 2024] HiDiffusion: Increases the resolution and speed of your diffusion model by only adding a single line of code!

Language:Jupyter NotebookApache-2.073100

DiLightNet

Official Code Release for [SIGGRAPH 2024] DilightNet: Fine-grained Lighting Control for Diffusion-based Image Generation

Language:PythonMIT8800

RPG-DiffusionMaster

[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)

Language:Jupyter NotebookMIT164100

Paints-UNDO

Understand Human Behavior to Align True Needs

Language:PythonApache-2.0325200

DriveDreamer

[ECCV 2024] DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving

27900

Omost

Your image is almost there!

Language:PythonApache-2.0717800

Mora

Mora: More like Sora for Generalist Video Generation

Language:Python147100

SEINE

[ICLR 2024] SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction

Language:PythonApache-2.088200

SimGen

Simulator-conditioned Driving Scene Generation

4500

LayoutGPT

Official repo for LayoutGPT

Language:PythonMIT28200

euler-scheduler

My implementation Diffusers-like Scheduler for performing Euler Method on Conditional Flow Matching models

Language:PythonMIT700

Visual-Reasoning-Papers

📄 A curated list of visual reasoning papers.

Language:TeX2000

MMSI

Code for "Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations" (CVPR 2024 Oral)

Language:PythonMIT800

DiverGen

DiverGen (CVPR 2024) & BSGAL (ICML 2024)

Language:PythonBSD-2-Clause3300

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Language:PythonApache-2.02502300