Jay Z. Wu (zhangjiewu)

zhangjiewu

Geek Repo

Company:National University of Singapore

Location:Singapore

Home Page:https://zhangjiewu.github.io

Twitter:@jayzhangjiewu

Github PK Tool:Github PK Tool

Jay Z. Wu's starred repositories

segment-anything-2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:9830Issues:0Issues:0

Awesome-GUI-Agent

💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.

Stargazers:94Issues:0Issues:0

video-language-understanding

[ACL’24 Findings] Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives

Stargazers:25Issues:0Issues:0

Genixer

(ECCV 2024) Empowering Multimodal Large Language Model as a Powerful Data Generator

Language:PythonStargazers:73Issues:0Issues:0

VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

Language:PythonLicense:MITStargazers:3941Issues:0Issues:0

T-Rex

[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Language:PythonLicense:NOASSERTIONStargazers:2092Issues:0Issues:0

U-ViT

A PyTorch implementation of the paper "All are Worth Words: A ViT Backbone for Diffusion Models".

Language:Jupyter NotebookLicense:MITStargazers:871Issues:0Issues:0

DragAnything

[ECCV 2024] DragAnything: Motion Control for Anything using Entity Representation

Language:PythonStargazers:398Issues:0Issues:0

MLLMs-Augmented

The official implementation of 《MLLMs-Augmented Visual-Language Representation Learning》

Language:PythonStargazers:30Issues:0Issues:0

PixArt-sigma

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Language:PythonLicense:AGPL-3.0Stargazers:1562Issues:0Issues:0

fast-DiT

Fast Diffusion Models with Transformers

Language:PythonLicense:NOASSERTIONStargazers:653Issues:0Issues:0

MaskDiT

Code for Fast Training of Diffusion Models with Masked Transformers

Language:PythonLicense:MITStargazers:342Issues:0Issues:0

SiT

Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"

Language:PythonLicense:MITStargazers:576Issues:0Issues:0

Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Language:PythonLicense:MITStargazers:11155Issues:0Issues:0

distrifuser

[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Language:PythonLicense:MITStargazers:537Issues:0Issues:0

video2dataset

Easily create large video dataset from video urls

Language:PythonLicense:MITStargazers:521Issues:0Issues:0

OpenDiT

OpenDiT: An Easy, Fast and Memory-Efficient System for DiT Training and Inference

Language:PythonLicense:Apache-2.0Stargazers:1413Issues:0Issues:0

LaVIT

LaVIT: Empower the Large Language Model to Understand and Generate Visual Content

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:479Issues:0Issues:0

DynamiCrafter

[ECCV 2024, Oral] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors

Language:PythonLicense:Apache-2.0Stargazers:2285Issues:0Issues:0

YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

Language:PythonLicense:GPL-3.0Stargazers:4171Issues:0Issues:0

Drive-WM

[CVPR 2024] A world model for autonomous driving.

Language:PythonLicense:Apache-2.0Stargazers:268Issues:0Issues:0

Neural-Network-Parameter-Diffusion

We introduce a novel approach for parameter generation, named neural network parameter diffusion (p-diff), which employs a standard latent diffusion model to synthesize a new set of parameters

Language:PythonStargazers:804Issues:0Issues:0

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Language:PythonLicense:NOASSERTIONStargazers:5889Issues:0Issues:0

Magic-Me

Codes for ID-Specific Video Customized Diffusion

Language:PythonLicense:Apache-2.0Stargazers:446Issues:0Issues:0

single-video-curation-svd

Educational repository for applying the main video data curation techniques presented in the Stable Video Diffusion paper.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:80Issues:0Issues:0

StableCascade

Official Code for Stable Cascade

Language:Jupyter NotebookLicense:MITStargazers:6497Issues:0Issues:0

Q-Align

â‘¢[ICML2024] [IQA, IAA, VQA] All-in-one Foundation Model for visual scoring. Can efficiently fine-tune to downstream datasets.

Language:PythonLicense:NOASSERTIONStargazers:236Issues:0Issues:0

Moore-AnimateAnyone

Character Animation (AnimateAnyone, Face Reenactment)

Language:PythonLicense:Apache-2.0Stargazers:3030Issues:0Issues:0

VideoCrafter

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

Language:PythonLicense:NOASSERTIONStargazers:4421Issues:0Issues:0

magvit2-pytorch

Implementation of MagViT2 Tokenizer in Pytorch

Language:PythonLicense:MITStargazers:517Issues:0Issues:0