Xu Cao's starred repositories
stable-diffusion-webui
Stable Diffusion web UI
LLaMA-Factory
A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
DDPM_inversion
Official pytorch implementation of the paper: "An Edit Friendly DDPM Noise Space: Inversion and Manipulations". CVPR 2024.
HiDiffusion
[ECCV 2024] HiDiffusion: Increases the resolution and speed of your diffusion model by only adding a single line of code!
DiLightNet
Official Code Release for [SIGGRAPH 2024] DilightNet: Fine-grained Lighting Control for Diffusion-based Image Generation
RPG-DiffusionMaster
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)
Paints-UNDO
Understand Human Behavior to Align True Needs
DriveDreamer
[ECCV 2024] DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving
euler-scheduler
My implementation Diffusers-like Scheduler for performing Euler Method on Conditional Flow Matching models
Visual-Reasoning-Papers
📄 A curated list of visual reasoning papers.
VCog-Bench
What is the Visual Cognition Gap between Humans and Multimodal LLMs?
MapUncertaintyPrediction
[CVPR 2024 Award Candidate] Producing and Leveraging Online Map Uncertainty in Trajectory Prediction
Awesome-LLM-Reasoning
Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought, Instruction-Tuning and Multimodality.
Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
BLINK_Benchmark
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.org/abs/2404.12390 [ECCV 2024]