Spatio-Temporal Segmentation
This repository contains the accompanying code for 4D-SpatioTemporal ConvNets: Minkowski Convolutional Neural Networks, CVPR'19.
Requirements
- Ubuntu 14.04 or higher
- CUDA 10.1 or higher
- pytorch 1.3 or higher
- python 3.6 or higher
- GCC 6 or higher
Installation
You need to install pytorch
and Minkowski Engine
either with pip
or with anaconda.
Pip
The MinkowskiEngine is distributed via PyPI MinkowskiEngine which can be installed simply with pip
.
First, install pytorch following the instruction. Next, install openblas
.
sudo apt install openblas
pip3 install torch torchvision
pip3 install -U MinkowskiEngine
Next, clone the repository and install the rest of the requirements
git clone https://github.com/chrischoy/SpatioTemporalSegmentation/
cd SpatioTemporalSegmentation
pip install -r requirements.txt
Troubleshooting
Please visit the MinkowskiEngine issue pages if you have difficulties installing Minkowski Engine.
ScanNet Training
-
Download the ScanNet dataset from the official website. You need to sign the terms of use.
-
Next, preprocess all scannet raw point cloud with the following command after you set the path correctly.
python -m lib.datasets.preprocessing.scannet
- Train the network with
export BATCH_SIZE=N;
./scripts/train_scannet.sh 0 \
-default \
"--scannet_path /path/to/preprocessed/scannet"
Modify the BATCH_SIZE
accordingly.
The first argument is the GPU id and the second argument is the path postfix and the last argument is the miscellaneous arguments.
mIoU vs. Overall Accuracy
The official evaluation metric for ScanNet is mIoU. OA, Overal Accuracy is not the official metric since it is not discriminative. This is the convention from the 2D semantic segmentation as the pixelwise overall accuracy does not capture the fidelity of the semantic segmentation. On 3D ScanNet semantic segmentation, OA: 89.087 -> mIOU 71.496 mAP 76.127 mAcc 79.660 on the ScanNet validation set v2.
Then why is the overall accuracy least discriminative metric? This is due to the fact that most of the scenes consist of large structures such as walls, floors, or background and scores on these will dominate the statistics if you use Overall Accuracy.
Synthia 4D Experiment
-
Download the dataset from download
-
Extract
cd /path/to/extract/synthia4d
wget http://cvgl.stanford.edu/data2/Synthia4D.tar
tar -xf Synthia4D.tar
tar -xvjf *.tar.bz2
- Training
export BATCH_SIZE=N; \
./scripts/train_synthia4d.sh 0 \
"-default" \
"--synthia_path /path/to/extract/synthia4d"
The above script trains a network. You have to change the arguments accordingly. The first argument to the script is the GPU id. Second argument is the log directory postfix; change to mark your experimental setup. The final argument is a series of the miscellaneous aruments. You have to specify the synthia directory here. Also, you have to wrap all arguments with " ".
Stanford 3D Dataset
-
Download the stanford 3d dataset from the website
-
Preprocess
Modify the input and output directory accordingly in
lib/datasets/preprocessing/stanford.py
And run
python -m lib.datasets.preprocessing.stanford
- Train
./scripts/train_stanford.sh 0 \
"-default" \
"--stanford3d_path /PATH/TO/PREPROCESSED/STANFORD"
Model Zoo
Model | Dataset | Voxel Size | Conv1 Kernel Size | Performance | Link |
---|---|---|---|---|---|
Mink16UNet34C | ScanNet train + val | 2cm | 3 | Test set 73.6% mIoU, no sliding window | download |
Mink16UNet34C | ScanNet train | 2cm | 5 | Val 72.219% mIoU, no rotation average, no sliding window per class performance | download |
Mink16UNet18 | Stanford Area5 train | 5cm | 5 | Area 5 test 65.828% mIoU, no rotation average, no sliding window per class performance | download |
Note that sliding window style evaluation (cropping and stitching results) used in many related works effectively works as an ensemble (rotation averaging) which boosts the performance.
Demo
The demo code will download the weights for ScanNet training split trained network Mink16UNet34C with conv1 kernel size 5 and visualize the prediction.
python -m demo.scannet
If you want to test a network trained on the Stanford dataset, run
python -m demo.stanford
Citing this work
If you use the Minkowski Engine, please cite:
@inproceedings{choy20194d,
title={4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks},
author={Choy, Christopher and Gwak, JunYoung and Savarese, Silvio},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={3075--3084},
year={2019}
}