Beast code in Giters

zhangjb416's starred repositories

VQASynth

Compose multimodal datasets 🎹

Language:Python12400

mobile_manipulation_papers

Papers in Mobile Manipulation (Personal Collection)

1900

ScanReason

[ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilities

1600

embodied-generalist

[ICML 2024] Official code repository for 3D embodied generalist agent LEO

Language:PythonMIT28500

OmniGibson

OmniGibson: a platform for accelerating Embodied AI research built upon NVIDIA's Omniverse engine. Join our Discord for support: https://discord.gg/bccR5vGFEx

Language:PythonMIT38500

cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Language:PythonApache-2.0150100

robocasa

RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots

Language:PythonNOASSERTION37900

Depth-Anything-V2

Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation

Language:PythonApache-2.0236400

MoMa-LLM

Language-Grounded Dynamic Scene Graphs for Interactive Object Search with Mobile Manipulation. Project website: http://moma-llm.cs.uni-freiburg.de

Language:PythonNOASSERTION3200

Grounded_3D-LLM

Code&Data for Grounded 3D-LLM with Referent Tokens

Language:Python5600

EmbodiedScan

[CVPR 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

Language:PythonApache-2.037500

SceneTracker

SceneTracker: Long-term Scene Flow Estimation Network

Language:PythonMIT9200

Track-Anything

Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.

Language:PythonMIT626100

mimicgen

This code corresponds to simulation environments used as part of the MimicGen project.

Language:PythonNOASSERTION21800

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonApache-2.02045300

RoboEXP

RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for Robotic Manipulation

Language:PythonMIT6300

3D-VLA

[ICML 2024] 3D-VLA: A 3D Vision-Language-Action Generative World Model

Language:Python22800

AVDC

Official repository of Learning to Act from Actionless Videos through Dense Correspondences.

Language:PythonMIT13500

ok-robot

An open, modular framework for zero-shot, language conditioned pick-and-drop tasks in arbitrary homes.

Language:PythonMIT40300

Semantic-Segment-Anything

Automated dense category annotation engine that serves as the initial semantic labeling for the Segment Anything dataset (SA-1B).

Language:PythonApache-2.0204500

Depth-Anything

[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation

Language:PythonApache-2.0640800

LIBERO

Benchmarking Knowledge Transfer in Lifelong Robot Learning

Language:Jupyter NotebookMIT16300

GROOT

Official implementation of GROOT, CoRL 2023

Language:Python4300

peract

Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation

Language:PythonApache-2.030800

d3fields

[arXiv] D^3Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Robotic Manipulation

Language:PythonMIT10100

arnold

[ICCV 2023] Official code repository for ARNOLD benchmark

Language:Jupyter NotebookMIT11800

Official codebase for I-JEPA, the Image-based Joint-Embedding Predictive Architecture. First outlined in the CVPR paper, "Self-supervised learning from images with a joint-embedding predictive architecture."

Language:PythonNOASSERTION275400

IsaacLab

Unified framework for robot learning built on NVIDIA Isaac Sim

Language:PythonNOASSERTION155200

zhangjb416