IrohXu

followers

following

stars

UIUC

Palo Alto

https://www.irohxucao.com/

Organizations

SZCHAI

Xu Cao's starred repositories

stable-diffusion-webui

Stable Diffusion web UI

Language:PythonAGPL-3.013697800

PerlDiff

PerlDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models

2800

LLaMA-Factory

A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Language:PythonApache-2.02806900

EB1A

EB1A Full Application - I-140 and I-485

Language:TeX20000

DDPM_inversion

Official pytorch implementation of the paper: "An Edit Friendly DDPM Noise Space: Inversion and Manipulations". CVPR 2024.

Language:PythonMIT22700

HiDiffusion

[ECCV 2024] HiDiffusion: Increases the resolution and speed of your diffusion model by only adding a single line of code!

Language:Jupyter NotebookApache-2.071200

DiLightNet

Official Code Release for [SIGGRAPH 2024] DilightNet: Fine-grained Lighting Control for Diffusion-based Image Generation

Language:PythonMIT6100

RPG-DiffusionMaster

[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)

Language:Jupyter Notebook161500

Paints-UNDO

Understand Human Behavior to Align True Needs

Language:PythonApache-2.0304200

DriveDreamer

[ECCV 2024] DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving

Omost

Your image is almost there!

Language:PythonApache-2.0697900

Mora

Mora: More like Sora for Generalist Video Generation

Language:Python145400

SEINE

[ICLR 2024] SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction

Language:PythonApache-2.086900

SimGen

Simulator-conditioned Driving Scene Generation

4100

LayoutGPT

Official repo for LayoutGPT

Language:PythonMIT27300

euler-scheduler

My implementation Diffusers-like Scheduler for performing Euler Method on Conditional Flow Matching models

Language:PythonMIT700

Visual-Reasoning-Papers

📄 A curated list of visual reasoning papers.

Language:TeX1900

MMSI

Code for "Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations" (CVPR 2024 Oral)

Language:PythonMIT700

DiverGen

DiverGen (CVPR 2024) & BSGAL (ICML 2024)

Language:PythonMIT3300

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Language:PythonApache-2.02432100

VCog-Bench

What is the Visual Cognition Gap between Humans and Multimodal LLMs?

Language:PythonMIT300

yolov10

YOLOv10: Real-Time End-to-End Object Detection

Language:PythonAGPL-3.0859900

MapUncertaintyPrediction

[CVPR 2024 Award Candidate] Producing and Leveraging Online Map Uncertainty in Trajectory Prediction

Language:PythonApache-2.012100

REDFormer

[ITSC 23] Official codebase for the paper 'Radar Enlighten the Dark: Enhancing Low-Visibility Perception for Automated Vehicles with Camera-Radar Fusion

Language:PythonMIT5200

Awesome-LLM-Reasoning

Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought, Instruction-Tuning and Multimodality.

MIT132100

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

BLINK_Benchmark

This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.org/abs/2404.12390 [ECCV 2024]

Language:PythonApache-2.08900

Vista

A Generalizable World Model for Autonomous Driving

Language:PythonApache-2.041900

chroma

the AI-native open-source embedding database

Language:RustApache-2.01385000

OmniDrive

Language:Python16100