Beast code in Giters

Show Lab's repositories

Awesome-Video-Diffusion

A curated list of recent diffusion models for video generation, editing, and various other applications.

5011 152 34

computer_use_ootb

Out-of-the-box (OOTB) GUI Agent for Windows and macOS

Language:PythonApache-2.01671 20 59

ShowUI

[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.

Language:PythonApache-2.01472 15 67

Show-o

[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Language:PythonApache-2.01333 17 54

Awesome-GUI-Agent

💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.

892 20 4

Awesome-MLLM-Hallucination

📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).

633 8 9

VLog

[CVPR 2025] Video Narration as Vocabulary & Video as Long Document

Language:PythonMIT579 8 13

Awesome-Unified-Multimodal-Models

📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.

481 220

PhotoDoodle

Code Implementation of "PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data"

Language:PythonMIT373 4 14

MakeAnything

Official code of "MakeAnything: Harnessing Diffusion Transformers for Multi-Domain Procedural Sequence Generation"

Language:PythonMIT171 4 4

MovieAgent

MovieAgent: Automated Movie Generation via Multi-Agent CoT Planning

Language:Python16600

FAR

Code for: "Long-Context Autoregressive Video Modeling with Next-Frame Prediction"

Language:PythonMIT164 4 2

Awesome-Robotics-Diffusion

(In progress) A curated list of recent robot learning papers incorporating diffusion models for robotics tasks.

116 20

ROICtrl

Code for [CVPR 2025] ROICtrl: Boosting Instance Control for Visual Generation

Language:Python105 1 2

LOVA3

(NeurIPS 2024) Official PyTorch implementation of LOVA3

Language:Python90 50

Impossible-Videos

Language:Python6400

GUI-Thinker

Enable AI to control your PC. This repo includes the WorldGUI Benchmark and GUI-Thinker Agent Framework.

Language:Python5500

MovieBench

[CVPR 2025] A Hierarchical Movie Level Dataset for Long Video Generation

Language:Python52 40

FQGAN

FQGAN: Factorized Visual Tokenization and Generation

Language:PythonNOASSERTION47 4 1

Exo2Ego-V

Language:PythonApache-2.037 1 1

MovieSeq

[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences

Language:Jupyter Notebook36 3 2

VideoGUI

[NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos

Language:JavaScript33 40

SMS

Balanced Image Stylization with Style Matching Score

28 6 1

DoraCycle

[CVPR 2025] DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles

1900

TPDiff

TPDiff: Temporal Pyramid Video Diffusion Model

1900

DiffSim

Official repository of DiffSim: Taming Diffusion Models for Evaluating Visual Similarity

Language:Python11 1 1

UniMoD

The code repository of UniMoD

9 1 1

SAM-I2V

Apache-2.0200

whisperV

Language:Jupyter Notebook200

InterFeedback

000