Hanzhi Chen (HanzhiC)

HanzhiC

Geek Repo

Company:Technical University of Munich

Location:Munich

Home Page:hanzhic.github.io

Github PK Tool:Github PK Tool

Hanzhi Chen's starred repositories

segment-anything-2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:9971Issues:61Issues:194

fiftyone

The open-source tool for building high-quality datasets and computer vision models

Language:PythonLicense:Apache-2.0Stargazers:8018Issues:55Issues:1496

mPLUG-Owl

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family

Language:PythonLicense:MITStargazers:2186Issues:30Issues:217

RT-DETR

[CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat YOLOs on Real-time Object Detection. 🔥 🔥 🔥

Language:PythonLicense:Apache-2.0Stargazers:2127Issues:25Issues:397

glomap

GLOMAP - Global Structured-from-Motion Revisited

Language:C++License:BSD-3-ClauseStargazers:1174Issues:22Issues:40

MultiDiffusion

Official Pytorch Implementation for "MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation" presenting "MultiDiffusion" (ICML 2023)

Language:Jupyter NotebookStargazers:967Issues:36Issues:25

mast3r

Grounding Image Matching in 3D with MASt3R

Language:PythonLicense:NOASSERTIONStargazers:682Issues:21Issues:32

dobb-e

Dobb·E: An open-source, general framework for learning household robotic manipulation

Language:G-codeLicense:MITStargazers:558Issues:15Issues:7

MOFA-Video

[ECCV 2024] MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model.

Language:PythonLicense:NOASSERTIONStargazers:555Issues:24Issues:45

TeleVision

Open-TeleVision: Teleoperation with Immersive Active Visual Feedback

Language:PythonLicense:NOASSERTIONStargazers:537Issues:7Issues:20

sjc

Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation (CVPR 2023)

Language:PythonLicense:NOASSERTIONStargazers:502Issues:20Issues:29

Awesome-Robotics-3D

A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites

vlmaps

[ICRA2023] Implementation of Visual Language Maps for Robot Navigation

Language:PythonLicense:MITStargazers:338Issues:11Issues:53

3D-VLA

[ICML 2024] 3D-VLA: A 3D Vision-Language-Action Generative World Model

transfusion-pytorch

Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI

License:MITStargazers:176Issues:0Issues:0

SceneVerse

Official implementation of ECCV24 paper "SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding"

Language:PythonLicense:MITStargazers:158Issues:11Issues:21

HOV-SG

[RSS2024] Official implementation of "Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation"

Language:PythonLicense:MITStargazers:138Issues:2Issues:15

FiT3D

[ECCV 2024] Improving 2D Feature Representations by 3D-Aware Fine-Tuning

Language:Jupyter NotebookLicense:MITStargazers:101Issues:0Issues:0

GAPartNet

[CVPR 2023 Highlight] GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts.

Language:Jupyter NotebookStargazers:92Issues:4Issues:15

RLAfford

RLAfford: End-to-End Affordance Learning for Robotic Manipulation, ICRA 2023

DragAPart

[ECCV 2024] Official Implementation of DragAPart: Learning a Part-Level Motion Prior for Articulated Objects.

Track-2-Act

code for the paper Predicting Point Tracks from Internet Videos enables Diverse Zero-Shot Manipulation

Language:PythonLicense:NOASSERTIONStargazers:50Issues:1Issues:3
Language:PythonStargazers:47Issues:0Issues:0

omninocs

A large-scale NOCS dataset.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:45Issues:8Issues:0

osam

Get up and running with SAM, EfficientSAM, YOLO-World, and other promptable vision models locally.

Language:PythonLicense:MITStargazers:41Issues:4Issues:0

RAM_code

Official implementation of RAM: Retrieval-Based Affordance Transfer for Generalizable Zero-Shot Robotic Manipulation

Language:PythonLicense:NOASSERTIONStargazers:23Issues:2Issues:2

BEVInstructor

[ECCV24] Navigation Instruction Generation with BEV Perception and Large Language Models

Stargazers:21Issues:0Issues:0

neural-isometries

Official JAX implementation of neural isometries - taming transformations for equivariant ML

Language:PythonLicense:NOASSERTIONStargazers:20Issues:0Issues:0

yolo-world-onnx

ONNX models of YOLO-World (an open-vocabulary object detection).

Language:PythonLicense:GPL-3.0Stargazers:12Issues:2Issues:1
Language:PythonLicense:CC0-1.0Stargazers:10Issues:0Issues:0