pengzhansun

SUN, Pengzhan's starred repositories

Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Language:Jupyter NotebookApache-2.014140 116 373

gaussian-splatting

Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"

Language:PythonNOASSERTION12586 110 816

dino

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO

Language:PythonApache-2.06071 68 245

GroundingDINO

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

Language:PythonApache-2.05635 36 283

sort

Simple, online, and realtime tracking of multiple objects in a video sequence.

Language:PythonGPL-3.03822 73 156

Moore-AnimateAnyone

Character Animation (AnimateAnyone, Face Reenactment)

Language:PythonApache-2.02928 35 140

co-tracker

CoTracker is a model for tracking any point (pixel) on a video.

Language:Jupyter NotebookNOASSERTION2528 25 75

CLIP4Clip

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

Language:PythonMIT818 12 109

Grounding-DINO-1.5-API

API for Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series

Language:PythonApache-2.0606 11 28

ChatCaptioner

Official Repository of ChatCaptioner

Language:Jupyter NotebookMIT447 4 7

Pandora

Pandora: Towards General World Model with Natural Language Actions and Video States

Language:Python430 17 7

Uni3D

[ICLR'24 Spotlight] Uni3D: 3D Visual Representation from BAAI

Language:PythonMIT429 12 21

unmasked_teacher

[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models

Language:PythonMIT267 14 39

This repo contains documentation and code needed to use PACO dataset: data loaders and training and evaluation scripts for objects, parts, and attributes prediction models, query evaluation scripts, and visualization notebooks.

Language:PythonMIT259 19 8

EgoVLP

[NeurIPS2022] Egocentric Video-Language Pretraining

Language:Python218 3 27

InternVideo2

MIT181 22 2

IART

[CVPR 2024 Highlight] Enhancing Video Super-Resolution via Implicit Resampling-based Alignment.

Language:Python135 3 9

VindLU

Language:PythonMIT95 4 11

EgoVLPv2

Code release for "EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone" [ICCV, 2023]

Language:PythonMIT82 5 10

attention-interpolation-diffusion

Interpolation Between Text-to-Image Generation!

Language:Python75 3 1

vrb

Language:PythonMIT67 4 7

InstructHumans

Editing Animated 3D Human Textures with Instructions

Language:Python51 20

hoi-forecast

[CVPR 2022] Joint hand motion and interaction hotspots prediction from egocentric videos

Language:PythonMIT49 5 13

MRFA

[NeurIPS 2023] Learning Motion Refinement for Unsupervised Face Animation

Language:Python34 5 5

IVG

This repo holds the official code and data for "Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions", which is accepted by ACL 2024 (Findings).

Apache-2.01500

RIO

Language:Python8 1 2

SR-Track

Language:Python7 2 1

DAIR

2 20