Jay Z. Wu's starred repositories
segment-anything-2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Awesome-GUI-Agent
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
video-language-understanding
[ACL’24 Findings] Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
DragAnything
[ECCV 2024] DragAnything: Motion Control for Anything using Entity Representation
MLLMs-Augmented
The official implementation of 《MLLMs-Augmented Visual-Language Representation Learning》
PixArt-sigma
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
distrifuser
[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
video2dataset
Easily create large video dataset from video urls
DynamiCrafter
[ECCV 2024, Oral] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
YOLO-World
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
Neural-Network-Parameter-Diffusion
We introduce a novel approach for parameter generation, named neural network parameter diffusion (p-diff), which employs a standard latent diffusion model to synthesize a new set of parameters
single-video-curation-svd
Educational repository for applying the main video data curation techniques presented in the Stable Video Diffusion paper.
StableCascade
Official Code for Stable Cascade
Moore-AnimateAnyone
Character Animation (AnimateAnyone, Face Reenactment)
VideoCrafter
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
magvit2-pytorch
Implementation of MagViT2 Tokenizer in Pytorch