Beast code in Giters

Show Lab's repositories

Awesome-Video-Diffusion

A curated list of recent diffusion models for video generation, editing, and various other applications.

5012 152 34

Show-o

[ICLR 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.

Language:PythonApache-2.01696 17 54

computer_use_ootb

Out-of-the-box (OOTB) GUI Agent for Windows and macOS

Language:PythonApache-2.01671 20 59

ShowUI

[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.

Language:PythonApache-2.01472 15 67

Show-1

[IJCV] Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation

Language:PythonNOASSERTION1132 36 20

Awesome-GUI-Agent

💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.

893 20 4

Awesome-MLLM-Hallucination

📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).

835 8 9

Awesome-Unified-Multimodal-Models

📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.

687 220

VideoSwap

Code for [CVPR 2024] VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence

Language:Python387 30 7

BoxDiff

[ICCV 2023] BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion

Language:Python270 3 19

Awesome-Robotics-Diffusion

A curated list of recent robot learning papers incorporating diffusion models for robotics tasks.

246 20

MakeAnything

Official code of "MakeAnything: Harnessing Diffusion Transformers for Multi-Domain Procedural Sequence Generation"

Language:PythonMIT171 4 4

VideoLISA

[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos

Language:PythonApache-2.0134 7 10

ROICtrl

Code for [CVPR 2025] ROICtrl: Boosting Instance Control for Visual Generation

Language:Python105 1 2

WorldGUI

Enable AI to control your PC. This repo includes the WorldGUI Benchmark and GUI-Thinker Agent Framework.

Language:Python9400

LOVA3

(NeurIPS 2024) Official PyTorch implementation of LOVA3

Language:Python90 50

sparseformer

(ICLR 2024, CVPR 2024) SparseFormer

Language:PythonMIT73 9 3

MovieBench

[CVPR 2025] A Hierarchical Movie Level Dataset for Long Video Generation

Language:Python52 40

Exo2Ego-V

Language:PythonApache-2.049 1 1

EvolveDirector

[NeurIPS 2024] EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.

Language:Python47 20

FQGAN

FQGAN: Factorized Visual Tokenization and Generation

Language:PythonNOASSERTION47 4 1

LayerTracer

Official code of "LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer"

Language:PythonMIT45 2 4

VideoGUI

[NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos

Language:JavaScript44 40

MovieSeq

[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences

Language:Jupyter Notebook36 3 2

DiffSim

[ICCV 2025] Official repository of DiffSim: Taming Diffusion Models for Evaluating Visual Similarity

Language:Python18 1 1

IDProtector

The code implementation of **IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation**.

Language:Python14 20

VisInContext

Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning

Language:Python14 2 1

Tune-An-Ellipse

[CVPR 2024] Tune-An-Ellipse: CLIP Has Potential to Find What You Want

Language:Python10 2 2

UniMoD

The code repository of UniMoD

9 1 1

watermark-steganalysis

Language:Python4 10