Show Lab (showlab)

Show Lab

showlab

Organization data from Github https://github.com/showlab

Home Page:https://sites.google.com/view/showlab

GitHub:@showlab

Show Lab's repositories

Awesome-Video-Diffusion

A curated list of recent diffusion models for video generation, editing, and various other applications.

Show-o

[ICLR 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.

Language:PythonLicense:Apache-2.0Stargazers:1696Issues:17Issues:54

computer_use_ootb

Out-of-the-box (OOTB) GUI Agent for Windows and macOS

Language:PythonLicense:Apache-2.0Stargazers:1671Issues:20Issues:59

ShowUI

[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.

Language:PythonLicense:Apache-2.0Stargazers:1472Issues:15Issues:67

Show-1

[IJCV] Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation

Language:PythonLicense:NOASSERTIONStargazers:1132Issues:36Issues:20

Awesome-GUI-Agent

đź’» A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.

Awesome-MLLM-Hallucination

đź“– A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).

Awesome-Unified-Multimodal-Models

đź“– This is a repository for organizing papers, codes and other resources related to unified multimodal models.

VideoSwap

Code for [CVPR 2024] VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence

BoxDiff

[ICCV 2023] BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion

Awesome-Robotics-Diffusion

A curated list of recent robot learning papers incorporating diffusion models for robotics tasks.

MakeAnything

Official code of "MakeAnything: Harnessing Diffusion Transformers for Multi-Domain Procedural Sequence Generation"

Language:PythonLicense:MITStargazers:171Issues:4Issues:4

VideoLISA

[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos

Language:PythonLicense:Apache-2.0Stargazers:134Issues:7Issues:10

ROICtrl

Code for [CVPR 2025] ROICtrl: Boosting Instance Control for Visual Generation

WorldGUI

Enable AI to control your PC. This repo includes the WorldGUI Benchmark and GUI-Thinker Agent Framework.

Language:PythonStargazers:94Issues:0Issues:0

LOVA3

(NeurIPS 2024) Official PyTorch implementation of LOVA3

Language:PythonStargazers:90Issues:5Issues:0

sparseformer

(ICLR 2024, CVPR 2024) SparseFormer

Language:PythonLicense:MITStargazers:73Issues:9Issues:3

MovieBench

[CVPR 2025] A Hierarchical Movie Level Dataset for Long Video Generation

Language:PythonStargazers:52Issues:4Issues:0
Language:PythonLicense:Apache-2.0Stargazers:49Issues:1Issues:1

EvolveDirector

[NeurIPS 2024] EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.

Language:PythonStargazers:47Issues:2Issues:0

FQGAN

FQGAN: Factorized Visual Tokenization and Generation

Language:PythonLicense:NOASSERTIONStargazers:47Issues:4Issues:1

LayerTracer

Official code of "LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer"

Language:PythonLicense:MITStargazers:45Issues:2Issues:4

VideoGUI

[NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos

Language:JavaScriptStargazers:44Issues:4Issues:0

MovieSeq

[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences

Language:Jupyter NotebookStargazers:36Issues:3Issues:2

DiffSim

[ICCV 2025] Official repository of DiffSim: Taming Diffusion Models for Evaluating Visual Similarity

IDProtector

The code implementation of **IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation**.

Language:PythonStargazers:14Issues:2Issues:0

VisInContext

Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning

Tune-An-Ellipse

[CVPR 2024] Tune-An-Ellipse: CLIP Has Potential to Find What You Want

UniMoD

The code repository of UniMoD