timengelbracht / spot-compose-tim

A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Spot-Compose: A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds

Under Review

Oliver Lemke1, Zuria Bauer1, RenΓ© ZurbrΓΌgg1, Marc Pollefeys1,2, Francis Engelmann1, Hermann Blum1

1ETH Zurich 2Microsoft Mixed Reality & AI Labs

Spot-Compose presents a comprehensive framework for integration of modern machine perception techniques with Spot, showing experiments with object grasping and dynamic drawer manipulation.

teaser

[Project Webpage] [Paper (coming soon!)]

News πŸ“°

  • Coming soon: release on arXiv.
  • 13. March 2024: Code released.

Code Structure 🎬

spot-compose/
β”œβ”€β”€ source/                            # All source code
β”‚   β”œβ”€β”€ utils/                         # General utility functions
β”‚   β”‚   β”œβ”€β”€ coordinates.py             # Coordinate calculations (poses, translations, etc.)
β”‚   β”‚   β”œβ”€β”€ docker_communication.py    # Communication with docker servers
β”‚   β”‚   β”œβ”€β”€ environment.py             # API keys, env variables
β”‚   β”‚   β”œβ”€β”€ files.py                   # File system handling
β”‚   β”‚   β”œβ”€β”€ graspnet_interface.py      # Communication with graspnet server
β”‚   β”‚   β”œβ”€β”€ importer.py                # Config-based importing
β”‚   β”‚   β”œβ”€β”€ mask3D_interface.py        # Handling of Mask3D instance segmentation
β”‚   β”‚   β”œβ”€β”€ point_clouds.py            # Point cloud computations
β”‚   β”‚   β”œβ”€β”€ recursive_config.py        # Recursive configuration files
β”‚   β”‚   β”œβ”€β”€ scannet_200_labels.py      # Scannet200 labels (for Mask3D)
β”‚   β”‚   β”œβ”€β”€ singletons.py              # Singletons for global unique access
β”‚   β”‚   β”œβ”€β”€ user_input.py              # Handle user input
β”‚   β”‚   β”œβ”€β”€ vis.py                     # Handle visualizations
β”‚   β”‚   β”œβ”€β”€ vitpose_interface.py       # Handle communications with VitPose docker server
β”‚   β”‚   └── zero_shot_object_detection.py # Object detections from images
β”‚   β”œβ”€β”€ robot_utils/                   # Utility functions specific to spot functionality
β”‚   β”‚   β”œβ”€β”€ base.py                    # Framework and wrapper for all scripts
β”‚   β”‚   β”œβ”€β”€ basic_movements.py         # Basic robot commands (moving body / arm, stowing, etc.)
β”‚   β”‚   β”œβ”€β”€ advanced_movements.py      # Advanced robot commands (planning, complex movements)
β”‚   β”‚   β”œβ”€β”€ frame_transformer.py       # Simplified transformation between frames of reference
β”‚   β”‚   β”œβ”€β”€ video.py                   # Handle actions that require access to robot cameras
β”‚   β”‚   └── graph_nav.py               # Handle actions that require access to GraphNav service
β”‚   └── scripts/
β”‚       β”œβ”€β”€ my_robot_scripts/
β”‚       β”‚   β”œβ”€β”€ estop_nogui.py         # E-Stop
β”‚       β”‚   └── ...                    # Other action scripts
β”‚       └── point_cloud_scripts/
β”‚           β”œβ”€β”€ extract_point_cloud.py # Extract point cloud from Boston Dynamics autowalk
β”‚           β”œβ”€β”€ full_align.py          # Align autowalk and scanned point cloud
β”‚           └── vis_ply_point_clouds_with_coordinates.py # Visualize aligned point cloud
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ autowalk/                      # Raw autowalk data
β”‚   β”œβ”€β”€ point_clouds/                  # Extracted point clouds from autowalks
β”‚   β”œβ”€β”€ prescans/                      # Raw prescan data
β”‚   β”œβ”€β”€ aligned_point_clouds/          # Prescan point clouds aligned with extracted autowalk clouds
β”‚   └── masked/                        # Mask3D output given aligned point clouds
β”œβ”€β”€ configs/                           # configs
β”‚   └── config.yaml                    # Uppermost level of recursive configurations (see configs sections for more info)
β”œβ”€β”€ shells/
β”‚   β”œβ”€β”€ estop.sh                       # E-Stop script
β”‚   β”œβ”€β”€ mac_routing.sh                 # Set up networking on workstation Mac
β”‚   β”œβ”€β”€ ubuntu_routing.sh              # Set up networking on workstation Ubuntu
β”‚   β”œβ”€β”€ robot_routing.sh               # Set up networking on NUC
β”‚   └── start.sh                       # Convenient script execution
β”œβ”€β”€ README.md                          # Project documentation
β”œβ”€β”€ requirements.txt                   # pip requirements file
β”œβ”€β”€ pyproject.toml                     # Formatter and linter specs
└── LICENSE

Dependencies πŸ“

The main dependencies of the project are the following:

python: 3.8

You can set up a pip environment as follows :

git clone --recurse-submodules git@github.com:oliver-lemke/spot-compose.git
cd spot-compose
virtualenv --python="/usr/bin/python3.8" "venv/"
source venv/bin/activate
pip install -r requirements.txt

Downloads πŸ’§

The pre-trained model weigts for Yolov-based drawer detection is available here.

Docker Containers 🐳

Docker containers are used to run external neural networks. This allows for easy modularity when working with multiple methods, without tedious setup. Each docker container funtions as a self-contained server, answering requests. Please refer to utils/docker_communication.py for your own custon setup, or to the respective files in utils/ for existing containers.

To run the respective docker container, please first pull the desired image via

docker pull [Link]

Once docker has finished pulling the image, you can start a container via the Run Command. When you are inside the container shell, simply run the Start Command to start the server.

Name Link Run Command Start Command
AnyGrasp craiden/graspnet:v1.0 docker run -p 5000:5000 --gpus all -it craiden/graspnet:v1.0 python3 app.py
OpenMask3D craiden/openmask:v1.0 docker run -p 5001:5001 --gpus all -it craiden/openmask:v1.0 python3 app.py
ViTPose craiden/vitpose:v1.0 docker run -p 5002:5002 --gpus all -it craiden/vitpose:v1.0 easy_ViTPose/venv/bin/python app.py
DrawerDetection craiden/yolodrawer:v1.0 docker run -p 5004:5004 --gpus all -it craiden/yolodrawer:v1.0 python3 app.py

Detailed Setup Instructions

Point Clouds ☁️

For this project, we require two point clouds for navigation (low resolution, captured by Spot) and segmentation (high resolution, capture by commodity scanner). The former is used for initial localization and setting the origin at the apriltag fiducial. The latter is used for accurate segmentation.

Low-Resolution Spot Point Cloud

To capture the point cloud please position Spot in front of your AptrilTag and start the autowalk. Zip the resulting and data and unzip it into the data/autowalk folder. Fill in the name of the unzipped folder in the config file under pre_scanned_graphs/low_res.

High-Resolution Commodity Point Cloud

To capture the point cloud we use the 3D Scanner App on iOS. Make sure the fiducial is visible during the scan for initialization. Once the scan is complete, click on Share and export two things:

  1. All Data
  2. Point Cloud/PLY with the High Density setting enabled and Z axis up disabled

Unzip the All Data zip file into the data/prescans folder. Rename the point cloud to pcd.ply and copy it into the folder, such that the resulting directory structure looks like the following:

prescans/
β”œβ”€β”€ all_data_folder/
β”‚   β”œβ”€β”€ annotations.json
β”‚   β”œβ”€β”€ export.obj
β”‚   β”œβ”€β”€ export_refined.obj
β”‚   β”œβ”€β”€ frame_00000.jpg
β”‚   β”œβ”€β”€ frame_00000.json
β”‚   β”œβ”€β”€ ...
β”‚   β”œβ”€β”€ info.json
β”‚   β”œβ”€β”€ pcd.ply.json
β”‚   β”œβ”€β”€ textured_output.jpg
β”‚   β”œβ”€β”€ textured_output.mtl
β”‚   β”œβ”€β”€ textured_output.obj
β”‚   β”œβ”€β”€ thumb_00000.jpg                
β”‚   └── world_map.arkit

Finally, fill in the name of your all_data_folder in the config file under pre_scanned_graphs/high_res.

Networking 🌐

In our project setup, we connect the robot via a NUC on Spot's back. The NUC is connected to Spot via cable, and to a router via WiFi.

However, since the robot is not directly accessible to the router, we have to (a) tell the workstation where to send information to the robot, and (b) tell the NUC to work as a bridge. You may have to adjust the addresses in the scripts to fit your setup.

Workstation Networking

On the workstation run ./shells/ubuntu_routing.sh (or ./shells/mac_routing.sh depending on your workstation operating system).

NUC Networking

First, ssh into the NUC, followed by running ./robot_routing.sh to configure the NUC as a network bridge.

Config βš™οΈ

The base config file can be found under configs/config.yaml. However, our config system allows for dynamically extending and inheriting from configs, if you have different setups on different workstations. To do this, simply specify the bottom-most file in the inheritance tree when creating the Config() object. Each config file specifies the file it inherits from in an extends field.

In our example, the overwriting config is specified in configs/template_extension.yaml, meaning the inheritance graph looks like:

template_extension.yaml ---overwrites---> config.yaml

In this example, we would specify Config(file='configs/template_extension.yaml'), which then overwrites all the config files it extends.

However, this functionality is not necessary for this project to work, so simply working with the config.yaml file as you are used to is supported by default.

Benchmark πŸ“ˆ

We provide detailed results here.

Open-Vocabulary Object Retrieval

experiments_manipulation

Dynamic Drawer Manipulation & Search

experiments_drawers

TODO πŸ”œ

  • Finish Documentation

BibTeX πŸ™

@article{lemke2024spotcompose,
  author    = {Lemke, Oliver and Bauer, Zuria and Zurbr\"{u}gg, Ren\'{e} and Engelmann, Francis and Blum, Hermann},
  title     = {Spot-Compose: A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds},
  year      = {2024},
}

About

A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds

License:MIT License


Languages

Language:Python 99.7%Language:Shell 0.3%