zhangjiewu

Jay Z. Wu's starred repositories

segment-anything-2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language:Jupyter NotebookApache-2.0983000

Awesome-GUI-Agent

💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.

9400

video-language-understanding

[ACL’24 Findings] Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives

2500

Genixer

(ECCV 2024) Empowering Multimodal Large Language Model as a Powerful Data Generator

Language:Python7300

VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

Language:PythonMIT394100

T-Rex

[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Language:PythonNOASSERTION209200

U-ViT

A PyTorch implementation of the paper "All are Worth Words: A ViT Backbone for Diffusion Models".

Language:Jupyter NotebookMIT87100

DragAnything

[ECCV 2024] DragAnything: Motion Control for Anything using Entity Representation

Language:Python39800

MLLMs-Augmented

The official implementation of 《MLLMs-Augmented Visual-Language Representation Learning》

Language:Python3000

PixArt-sigma

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Language:PythonAGPL-3.0156200

fast-DiT

Fast Diffusion Models with Transformers

Language:PythonNOASSERTION65300

MaskDiT

Code for Fast Training of Diffusion Models with Masked Transformers

Language:PythonMIT34200

SiT

Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"

Language:PythonMIT57600

Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Language:PythonMIT1115500

distrifuser

[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Language:PythonMIT53700

video2dataset

Easily create large video dataset from video urls

Language:PythonMIT52100

OpenDiT

OpenDiT: An Easy, Fast and Memory-Efficient System for DiT Training and Inference

Language:PythonApache-2.0141300

LaVIT

LaVIT: Empower the Large Language Model to Understand and Generate Visual Content

Language:Jupyter NotebookNOASSERTION47900

DynamiCrafter

[ECCV 2024, Oral] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors

Language:PythonApache-2.0228500

YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

Language:PythonGPL-3.0417100

Drive-WM

[CVPR 2024] A world model for autonomous driving.

Language:PythonApache-2.026800

Neural-Network-Parameter-Diffusion

We introduce a novel approach for parameter generation, named neural network parameter diffusion (p-diff), which employs a standard latent diffusion model to synthesize a new set of parameters

Language:Python80400