zhangjb416's starred repositories

VQASynth

Compose multimodal datasets 🎹

Language:PythonStargazers:124Issues:0Issues:0

mobile_manipulation_papers

Papers in Mobile Manipulation (Personal Collection)

Stargazers:19Issues:0Issues:0

ScanReason

[ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilities

Stargazers:16Issues:0Issues:0

embodied-generalist

[ICML 2024] Official code repository for 3D embodied generalist agent LEO

Language:PythonLicense:MITStargazers:285Issues:0Issues:0

OmniGibson

OmniGibson: a platform for accelerating Embodied AI research built upon NVIDIA's Omniverse engine. Join our Discord for support: https://discord.gg/bccR5vGFEx

Language:PythonLicense:MITStargazers:385Issues:0Issues:0

cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Language:PythonLicense:Apache-2.0Stargazers:1501Issues:0Issues:0

robocasa

RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots

Language:PythonLicense:NOASSERTIONStargazers:379Issues:0Issues:0

Depth-Anything-V2

Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation

Language:PythonLicense:Apache-2.0Stargazers:2364Issues:0Issues:0

MoMa-LLM

Language-Grounded Dynamic Scene Graphs for Interactive Object Search with Mobile Manipulation. Project website: http://moma-llm.cs.uni-freiburg.de

Language:PythonLicense:NOASSERTIONStargazers:32Issues:0Issues:0

Grounded_3D-LLM

Code&Data for Grounded 3D-LLM with Referent Tokens

Language:PythonStargazers:56Issues:0Issues:0

EmbodiedScan

[CVPR 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

Language:PythonLicense:Apache-2.0Stargazers:375Issues:0Issues:0

SceneTracker

SceneTracker: Long-term Scene Flow Estimation Network

Language:PythonLicense:MITStargazers:92Issues:0Issues:0

Track-Anything

Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.

Language:PythonLicense:MITStargazers:6261Issues:0Issues:0

mimicgen

This code corresponds to simulation environments used as part of the MimicGen project.

Language:PythonLicense:NOASSERTIONStargazers:218Issues:0Issues:0

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonLicense:Apache-2.0Stargazers:20453Issues:0Issues:0

RoboEXP

RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for Robotic Manipulation

Language:PythonLicense:MITStargazers:63Issues:0Issues:0

3D-VLA

[ICML 2024] 3D-VLA: A 3D Vision-Language-Action Generative World Model

Language:PythonStargazers:228Issues:0Issues:0

AVDC

Official repository of Learning to Act from Actionless Videos through Dense Correspondences.

Language:PythonLicense:MITStargazers:135Issues:0Issues:0

ok-robot

An open, modular framework for zero-shot, language conditioned pick-and-drop tasks in arbitrary homes.

Language:PythonLicense:MITStargazers:403Issues:0Issues:0

Semantic-Segment-Anything

Automated dense category annotation engine that serves as the initial semantic labeling for the Segment Anything dataset (SA-1B).

Language:PythonLicense:Apache-2.0Stargazers:2045Issues:0Issues:0

Depth-Anything

[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation

Language:PythonLicense:Apache-2.0Stargazers:6408Issues:0Issues:0

LIBERO

Benchmarking Knowledge Transfer in Lifelong Robot Learning

Language:Jupyter NotebookLicense:MITStargazers:163Issues:0Issues:0

GROOT

Official implementation of GROOT, CoRL 2023

Language:PythonStargazers:43Issues:0Issues:0

peract

Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation

Language:PythonLicense:Apache-2.0Stargazers:308Issues:0Issues:0

d3fields

[arXiv] D^3Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Robotic Manipulation

Language:PythonLicense:MITStargazers:101Issues:0Issues:0
Language:PythonStargazers:24Issues:0Issues:0

arnold

[ICCV 2023] Official code repository for ARNOLD benchmark

Language:Jupyter NotebookLicense:MITStargazers:118Issues:0Issues:0
Language:PythonLicense:MITStargazers:54Issues:0Issues:0

ijepa

Official codebase for I-JEPA, the Image-based Joint-Embedding Predictive Architecture. First outlined in the CVPR paper, "Self-supervised learning from images with a joint-embedding predictive architecture."

Language:PythonLicense:NOASSERTIONStargazers:2754Issues:0Issues:0

IsaacLab

Unified framework for robot learning built on NVIDIA Isaac Sim

Language:PythonLicense:NOASSERTIONStargazers:1552Issues:0Issues:0