Grasp Pre-shape Selection by Synthetic Training:
Eye-in-hand Shared Control on the Hannes Prosthesis

We propose a pipeline to select the pre-shape of the Hannes prosthesis using visual input. We collected a real dataset and used it to train and test our models. The test sets are organized into 5 different sets of increasing complexity. Each test set represents a different condition that doesn't appear in the real training set.
Our main contribution is a synthetic data generation pipeline designed for vision-based prosthetic grasping. We compare a model trained on real data with the same model trained on the proposed synthetic data. As shown in the table below, the synthetically-trained model achieves comparable average value and better standard deviation, proving our method robustness.
Our work is accepted to IROS 2022

Test set	Real training Video acc. (%)	Synthetic training Video acc. (%)
Same person	98.9 ± 0.8	80.2 ± 0.9

Different velocity	81.7 ± 0.9	79.7 ± 0.8
From ground	76.2 ± 1.0	76.0 ± 0.9
Seated	63.9 ± 1.0	68.1 ± 1.0

Different background	56.2 ± 1.7	76.4 ± 2.0

Average over test sets	75.4 ± 14.8	76.1 ± 4.3

Description

This repository contains the PyTorch code to reproduce the results presented in our work.

Install

The code is developed with Python 3.8 - PyTorch 1.10.1 - CUDA 10.2.

Clone project and install dependencies:

# clone project   
git clone https://github.com/hsp-iit/prosthetic-grasping-experiments
# create virtual environment and install dependencies
cd prosthetic-grasping-experiments
python3.8 -m venv pge-venv
source pge-venv/bin/activate
pip install -r requirements.txt

# in the install above, torch installation may fail. Try with the command below:
pip install torch==1.10.1+cu102 torchvision==0.11.2+cu102 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu102/torch_stable.html
# ..or go through the website: https://pytorch.org/get-started/previous-versions/

Dataset preparation

We provide a script to automatically download the real and synthetic datasets (specify your preferred folder path with --out_dataset_folder, in the example below it saves in the datasets folder located in the parent folder of this repository). The script also arrange the datasets according to the format required by our dataloaders.

python download_dataset.py --out_dataset_folder ../datasets --remove_zips

The datasets folder contains all the datasets: both the real (i.e. iHannesDataset) and the synthetic (i.e. ycb_synthetic_dataset) dataset. For each dataset, both the frames and the pre-extracted features using mobilenet_v2 (pre-trained on ImageNet) are available.
The datasets folder has the following macro-structure (i.e., path to the specific dataset folder):

datasets/
├── real/
|   ├── frames/
|   |   ├── iHannesDataset
|   |   
│   ├── features/
|       ├── mobilenet_v2/
|           ├── iHannesDataset
|
├── synthetic/
    ├── frames/
    |   ├── ycb_synthetic_dataset
    |   
    ├── features/
        ├── mobilenet_v2/
            ├── ycb_synthetic_dataset

Each dataset (i.e., iHannesDataset, ycb_synthetic_dataset) has the following path to the frames/features:

DATASET_BASE_FOLDER/CATEGORY_NAME/OBJECT_NAME/PRESHAPE_NAME/Wrist_d435/rgb*

If you want to use our dataloaders, make sure that the above arrangement (both the macro-structure and the path to frames/features) is maintained.

Create softlinks:

cd prosthetic-grasping-experiments/data
ln -s /YOUR_PATH_TO_DATASETS_FOLDER/real 
ln -s /YOUR_PATH_TO_DATASETS_FOLDER/synthetic

and the resulting structure is:

prosthetic-grasping-experiments/
├── data/
    ├── real/
    |   ├── ...
    |   
    ├── synthetic/
        ├── ...

Extract features [optional]

Pre-extracted features are already provided by downloading the datasets above. However, to extract features on your own, you can use:

cd prosthetic-grasping-experiments
python src/tools/cnn/extract_features.py \
--batch_size 1 --source Wrist_d435 \
--input rgb --model cnn --dataset_type SingleSourceImage \
--feature_extractor mobilenet_v2 --pretrain imagenet \
--dataset_name iHannesDataset

For each video, a features.npy file is generated. The file has shape (num_frames_in_video, feature_vector_dim) and will be located according to the path defined above.

Training

All runnable files are located under the src/tools folder. At the beginning of each file you can find some run command examples, with different arguments.

When the training starts, a folder is created at the prosthetic-grasping-experiments/runs path (you can specify the folder name with --log_dir argument). This folder is used to store the measures and the best model checkpoint.

Example 1: train the fully-connected classifier of mobilenet_v2 on the real dataset, starting from pre-extracted features:

cd prosthetic-grasping-experiments
python src/tools/cnn/train.py --epochs 5 \
--batch_size 32 --source Wrist_d435 --dataset_type SingleSourceImage \
--split random --input rgb --output preshape --model cnn \
--feature_extractor mobilenet_v2 --pretrain imagenet --freeze_all_conv_layers \
--from_features --dataset_name iHannesDataset \
--log_dir train_from_features

Example 2: same as above, but training on synthetic data (remember to add the --synthetic argument, otherwise a wrong path to the dataset is constructed):

cd prosthetic-grasping-experiments
python src/tools/cnn/train.py --epochs 5 \
--batch_size 64 --source Wrist_d435 --dataset_type SingleSourceImage \
--split random --input rgb --output preshape --model cnn \
--feature_extractor mobilenet_v2 --pretrain imagenet --freeze_all_conv_layers \
--from_features --dataset_name ycb_synthetic_dataset --synthetic

Example 3: train the LSTM on the real dataset, starting from pre-extracted features:

cd prosthetic-grasping-experiments
python src/tools/cnn_rnn/train.py --epochs 10 \
--batch_size 32 --source Wrist_d435 --dataset_type SingleSourceVideo \
--split random --input rgb --output preshape --model cnn_rnn --rnn_type lstm \
--rnn_hidden_size 256 --feature_extractor mobilenet_v2 --pretrain imagenet \
--freeze_all_conv_layers --from_features --dataset_name iHannesDataset

Example 4: fine-tune the whole network (i.e., use RGB frames instead of pre-extracted features) starting from the ImageNet weights:

cd prosthetic-grasping-experiments
python src/tools/cnn/train.py --epochs 10 \
--batch_size 64 --source Wrist_d435 --dataset_type SingleSourceImage \
--split random --input rgb --output preshape --model cnn \
--feature_extractor mobilenet_v2 --pretrain imagenet \
--lr 0.0001 --dataset_name ycb_synthetic_dataset --synthetic

Test

To test a model, copy and paste its running command used for training and substitute the train.py script with eval.py. Moreover, you have to specify the path to the model checkpoint with --checkpoint argument and the test set with --test_type argument.

Example 1: test the model on the Same person test set:

cd prosthetic-grasping-experiments
python src/tools/cnn/eval.py --epochs 5 \
--batch_size 32 --source Wrist_d435 --dataset_type SingleSourceImage \
--split random --input rgb --output preshape --model cnn \
--feature_extractor mobilenet_v2 --pretrain imagenet --freeze_all_conv_layers \
--from_features --dataset_name iHannesDataset \
--log_dir train_from_features \
--checkpoint runs/train_from_features/best_model.pth --test_type test_same_person

Some confusion matrices will be displayed on screen, you can simply close them and visualize later on tensorboard. Many different metrics, both at per-frame and video granularity, are printed on the shell. In our work, the results are presented as video accuracy (obtained from per-frame predictions through majority voting, excluding the background class). This value is printed on the shell as follows:

.
.
.

=== VIDEO METRICS ===

ACCURACY W BACKGR: xx.xx%

ACCURACY W/O BACKGR: xx.xx%       <==

.
.
.

You can visualize both the training and evaluation metrics on tensorboard with:

cd prosthetic-grasping-experiments
tensorboard --logdir runs/train_from_features

Citation

@inproceedings{vasile2022,
    author    = {F. Vasile and E. Maiettini and G. Pasquale and A. Florio and N. Boccardo and L. Natale},
    booktitle = {2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
    title     = {Grasp Pre-shape Selection by Synthetic Training: Eye-in-hand Shared Control on the Hannes Prosthesis},
    year      = {2022},
    month     = {Oct},
}

Mantainer

This repository is mantained by:


	@FedericoVasile1

hsp-iit / prosthetic-grasping-experiments