RobotFetchMeIt

Brown university CSCI 2952-O final project.

Given a depth map and a text description, we use Spot to detect the object that matches the description in the scene and walk to it.

This repo is based on imvotenet.

Group members

Poster

Demo

Demo only with 3D object detector

Report

report

Resources

spot-sdk for Spot API

Installation

Clone this repository and submodules:

git clone https://github.com/GuardianWang/imvotenet.git RobotFetchMeIt
cd RobotFetchMeIt
git submodule init

We use conda to create a virtual environment. Because loading the weights of text model requires pytorch>=1.8 and compiling pointnet2 requires pytorch==1.4, we are using two conda environments.

conda create -n fetch python=3.7 -y
conda create -n torch18 python=3.7 -y

Overall, the installation is similar to VoteNet. GPU is required. The code is tested with Ubuntu 18.04, Python 3.7.13, PyTorch 1.4.0, CUDA 10.0, cuDNN v7.4, and NVIDIA GeForce RTX 2080.

Before installing Pytorch, make sure the machine has GPU on it. Check by nvidia-smi. We need GPU to compile PointNet++. First install PyTorch, for example through Anaconda: Find cudatoolkit and cudnn version match here. Find how to let nvcc detect a specific cuda version here.

conda activate fetch
conda install pytorch==1.4.0 torchvision==0.5.0 cudatoolkit=9.2 cudnn=7.6.5 -c pytorch
pip install matplotlib opencv-python plyfile tqdm networkx==2.2 trimesh==2.35.39 protobuf
pip install open3d
pip install bosdyn-client bosdyn-mission bosdyn-choreography-client

conda activate torch18
conda install pytorch==1.8.0 torchvision==0.9.0 cudatoolkit=10.2 cudnn=7.6.5 -c pytorch
pip install ordered-set
pip install tqdm
pip install pandas

The code depends on PointNet++ as a backbone, which needs compilation. We need to compile by the arch of GPU we want to use.

conda activate fetch
cd pointnet2
TORCH_CUDA_ARCH_LIST="7.5" python setup.py install
cd ..

Model weights and data

VoteNet weights (12 MB): checkpoint.tar
Text model weights (1.4 GB)
Text features (500 MB)

bash download.sh 
mv checkpoint.pth TextCondRobotFetch
unzip subdataset.zip -d <path to unzip>

Run the demo

In terminal1, run

conda activate fetch
export ROBOT_IP=<your spot ip>
python estop_nogui.py $ROBOT_IP

You can force stop the spot by pressing <space>

In terminal2, run

conda activate fetch
export ROBOT_IP=<your spot ip>
export BOSDYN_CLIENT_USERNAME=<spot username>
export BOSDYN_CLIENT_PASSWORD=<spot password>
export BOSDYN_DOCK_ID=<spot dock id>
python predict_example.py --checkpoint_path checkpoint.tar --dump_dir pred_votenet --cluster_sampling seed_fps --use_3d_nms --use_cls_nms --per_class_proposal --time_per_move=5 --username $BOSDYN_CLIENT_USERNAME --password $BOSDYN_CLIENT_PASSWORD --dock_id $BOSDYN_DOCK_ID  $ROBOT_IP

In terminal3, run

conda activate torch18
export ROBOT_IP=<your spot ip>
python shape_inference.py --checkpoint TextCondRobotFetch/checkpoint.pth

You can specify the text and corresponding pre-extracted features by changing latent_id parameter passed to pred_shape() in shape_inference.py::pred_shape. Texts and features are stored in TextCondRobotFetch/embeddings

You can specify the path to subdataset in the folder parameter passed in shape_inference.py::get_partial_scans.

Train the detector yourself

Read imvotenet to see how to train the detector.

LICENSE

The code is released under the MIT license.

TODO

[] subdataset
[] zip shared by Rao

GuardianWang / RobotFetchMeIt