Hanzhi Chen's starred repositories
segment-anything-2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
MultiDiffusion
Official Pytorch Implementation for "MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation" presenting "MultiDiffusion" (ICML 2023)
MOFA-Video
[ECCV 2024] MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model.
TeleVision
Open-TeleVision: Teleoperation with Immersive Active Visual Feedback
Awesome-Robotics-3D
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
SceneVerse
Official implementation of ECCV24 paper "SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding"
Track-2-Act
code for the paper Predicting Point Tracks from Internet Videos enables Diverse Zero-Shot Manipulation
BEVInstructor
[ECCV24] Navigation Instruction Generation with BEV Perception and Large Language Models
neural-isometries
Official JAX implementation of neural isometries - taming transformations for equivariant ML
HOIDiffusion
Official Code Release for HOIDiffusion (CVPR 2024)
yolo-world-onnx
ONNX models of YOLO-World (an open-vocabulary object detection).