SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and Quasi-Planar Segmentation

by Jingwen Wang, Juan Tarrio, Lourdes Agapito, Pablo F. Alcantarilla, Alexander Vakhitov.

Jingwen Wang (jingwen.wang.17@ucl.ac.uk) is the original author of the core of the method and the evaluaton scripts. Alexander Vakhitov (alexander@slamcore.com) is the author of the QPOS over-segmentation method implementation.

This repository contains the code to train and evaluate the method and the link to the Semantic Mapping with Realsense dataset.

1. Set up environment

python=3.9 pytorch=1.11.0 cuda=11.3

conda env create -f environment.yml
conda activate semlaps

pytorch3d

pip install --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py39_cu113_pyt1110/download.html

2. Data Preparation

2.1 ScanNet Data

You can download ScanNet by following their official instruction. Apart from the basic data, you will also need 2D semantic GT and 3D meshes with GT labels. You will expect to have the following fil types:

.sens: extracted to depth, pose, color, intrinsic
.txt: some meta-data
_vh_clean_2.labels.ply: 3D GT mesh
_2d-label-filt.zip: 2D semantic data

You will also need to process the raw 2D semantic images, resizing them to 640x480 and convert the semantic encoding from NYU-40 to ScanNt-20.

Unfortunately, we are not able to provide the code for this part. Please refer to official ScanNet GitHub or raise an issue if you have questions.

2.2 SMR dataset

Please find the Semantic Mapping for Realsense dataset here.

3. Train LPN

First, Create multi-view frames data

python create_fragments_n_views.py --scannet_root ${scannet_root} --save_files_root image_pairs

This will creates the multi-view training indices (triplet) of the camera frames for all the 1513 scenes in the train/val dataset of ScanNet.

Train LPN with multi-view latent feature fusion

Training:

python train_lpn.py --config configs/config_lpn.yaml --scannet_root /media/jingwen/Data2/scannet/scans --log_dir exps/LPN

LPN supports 4 different modes for you to explore:

Multi-view RGBD (default): rgb and depth fusion with SSMA + feature warping (w/ depth, camera poses and K) modality=rgbd, use_ssma=True, reproject=True
Multi-view RGBD with RGB feature: rgb-only encoder, no SSMA, depth only used in feature warping (w/ depth, camera poses and K) modality=rgbd, use_ssma=False, reproject=True
Single-view RGBD: rgb and depth fusion with SSMA modality=rgbd, use_ssma=True, reproject=False
Single-view RGB: rgb input only modality=rgb, use_ssma=False, reproject=False

Evaluate script for LPN (2D):

python eval_lpn.py --log_dir exps/LPN --dataset_type scannet --dataset_root /media/jingwen/Data3/scannet/scans --save_dir exps/LPN/eval/scannet_val --eval

4. Train SegConvNet

Step 1: Run offline QPOS and get segments

segment_suffix=segments/QPOS
python run_qpos.py --segment_suffix ${segment_suffix} --dataset_type scannet --dataset_root ${scannet_root} --small_segment_size 30 --expected_segment_size 60

Segments will be saved under ${scannet_root}/${scene}/${segment_suffix} for each scene. Note that for slamcore sequences we have to adjust the segment size --small_segment_size 120 --expected_segment_size 240

Step 2: Label GT meshes with BayesianFusion and LPN inference results

log_dir=logs/LPN
label_fusion_dir=exps/LPN_labels_3D
python eval_lpn_bayesian_label.py --log_dir ${log_dir} --dataset_type scannet --dataset_root ${scannet_root} --save_dir ${label_fusion_dir}

Labelled meshes will be saved under ${label_fusion_dir}/${scene}

Step 3: Prepare the training data

python prepare_3d_training_data.py --label_fusion_dir ${label_fusion_dir} --segment_suffix ${segment_suffix} --dataset_type scannet --dataset_root ${scannet_root} --save_mesh

The training data will be saved under ${label_fusion_dir}/${scene}/${segment_suffix}

Step 4: Train SegConvNet

python train_segconvnet.py --config configs/config_segconvnet.yaml  --log_dir exps/SegConvNet --label_fusion_dir ${label_fusion_dir} --segment_suffix ${segment_suffix}

Evaluation script for the SegConvNet:

segconv_logdir=exps/SegConvNet
python eval_segconvnet.py --log_dir ${segconv_logdir} --dataset_type scannet --dataset_root ${scannet_root} --label_fusion_dir ${label_fusion_dir} --segment_suffix ${segment_suffix} --save_dir exps/SegConvNet_labels

To reproduce results on the SMR dataset from the paper, please do steps 1, 2, 3 and then run the evaluation script for the SegConvNet eval_segconvnet.py.

5. Sequential Inference

Run ScanNet sequential simulator

First download example scannet scene0645_00 from here and extract it under $SCANNET_ROOT. You then should expect to have the following directory structure:

$SCANNET_ROOT
├── scene0645_00
    ├── color
        ├── 0.jpg
        ├── 1.jpg
        ...
    ├── depth
        ├── 0.png
        ├── 1.png
        ...
    ├── intrinsic
    ├── pose
    scene0645_00_vh_clean_2.ply
...

Then you need to update the ScanNet root path mapping here, simply put your hostname and $SCANNET_ROOT as the key and value.

Then you also need to download the checkpoint files:

LPN: here
LPN_rgb: here
SegConvNet: here

And extract them under $EXP_DIR. Then run the following command:

python sequential_runner_scannet.py --exp_dir $EXP_DIR --scene scene0645_00 --mapping_every 20 --skip 1

This will save the results under $EXP_DIR/scannet/scene0645_00_skip20

Run RealSense sequential simulator

First download example RealSense sequencekitchen1 from here and extract it under $SLAMCORE_ROOT. You then should expect to have the following directory structure:

$SLAMCORE_ROOT
├── kitchen1
    ├── color
        ├── 0.png
        ├── 10.png
        ├── 20.png
        ...
    ├── depth
        ├── 0.png
        ├── 10.png
        ├── 20.png
        ...
    ├── pose
        ├── 0.txt
        ├── 10.txt
        ├── 20.txt
    align.txt
    K.txt
    global_map_mesh.clean.ply
...

Then you need to update the SMR root path mapping here, Note that align.txt is a transformation matrix (translation only) to shift the origin to approximately np.min(verts, axis=0). You can simply save it when creating the mesh.