Complementary Random Masking for RGB-T Semantic Segmentation

This is the official pytorch implementation of the paper:

Complementary Random Masking for RGB-T Semantic Segmentation

Ukcheol Shin, Kyunghyun Lee, In So Kweon

[Paper] [Project page]

Semantic Segmentation Result on MFNet, PST900, KAIST Pedestrian(KP) datasets

Further visualization can be found in the video.

Updates

2023.03.30: Release evaluation code and pre-trained weights.
2023.12.19: Release training code

Usage

Installation

This codebase was developed and tested with the following packages.

OS: Ubuntu 20.04.1 LTS
CUDA: 11.3
PyTorch: 1.10.1
Python: 3.9.16
Detectron2: 0.6

You can build your conda environment with the provided YAML file.

conda env create --file environment.yaml

Or you can build it manually.

conda create pytorch==1.10.1 torchvision==0.11.2 cudatoolkit=11.3 -c pytorch -c conda-forge --name CRM
conda activate CRM
python -m pip install detectron2 -f \
  https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.10/index.html
pip install mmcv==1.7.1 pytorch-lightning==1.9.2 scikit-learn==1.2.2 timm==0.6.13 imageio==2.27.0 setuptools==59.5.0

If you want to test this codebase in another software stack, check the following compatibility:

After building conda environment, compile CUDA kernel for MSDeformAttn. If you have trouble, refer here

cd models/mask2former/pixel_decoder/ops/
sh make.sh

Dataset

Download datasets and place them in 'datasets' folder in the following structure:

Since the original KP dataset has a large volume (>35GB) and requesting labels takes time, we recommend to use our pre-organized dataset (includes labels as well).

<datasets>
|-- <MFdataset>
    |-- <images>
    |-- <labels>
    |-- train.txt
    |-- val.txt
    |-- test.txt
    ...
|-- <PSTdataset>
    |-- <train>
        |-- rgb
        |-- thermal
        |-- labels
        ...
    |-- <test>
        |-- rgb
        |-- thermal
        |-- labels
        ...
|-- <KPdataset>
    |-- <images>
        |-- set00
        |-- set01
        ...
    |-- <labels>
    |-- train.txt
    |-- val.txt
    |-- test.txt
    ...

Training

Download SwinTransformer backbone weights pretrained in ImageNet and convert its format for Mask2Former compatibility:

cd pretrained
sh download_backbone_pt_weight.sh
sh convert_pth_to_pkl.sh

Train a model with the config file. If you want to change hyperparamter (e.g., batch size, epoch, learning rate, etc), edit config file in 'configs' folder.

Single GPU, MF dataset, Swin-S model

CUDA_VISIBLE_DEVICES=0 python train.py --config-file ./configs/MFdataset/swin/CRM_swin_small_224.yaml --num-gpus 1 --name MF_CRM_swin_S

Multi GPUs, KP dataset, Swin-B model

CUDA_VISIBLE_DEVICES=0,1 python train.py --config-file ./configs/KPdataset/swin/CRM_swin_base_224.yaml --num-gpus 2 --name KP_CRM_swin_B_multi

Start a tensorboard session to check training progress.

tensorboard --logdir=checkpoints/

You can see the progress by opening https://localhost:6006 on your browser.

Evaluation

Evaluate the trained model by running

CUDA_VISIBLE_DEVICES=0 python test.py --config-file ./configs/MFdataset/swin/CRM_swin_small_224.yaml --num-gpus 1 --name Eval_MF_CRM_swin_S --checkpoint "PATH for WEIGHT"

Result

We offer the pre-trained weights on three RGB-T semantic segmentation dataset.

MFNet dataset (9 classes)

Architecture	Backbone	mIOU	Weight
CRM (Mask2Former)	Swin-T	59.1%	MF-CRM-Swin-T
CRM (Mask2Former)	Swin-S	61.2%	MF-CRM-Swin-S
CRM (Mask2Former)	Swin-B	61.4%	MF-CRM-Swin-B

PST900 (5 classes)

Architecture	Backbone	mIOU	Weight
CRM (Mask2Former)	Swin-T	85.9%	PST-CRM-Swin-T
CRM (Mask2Former)	Swin-S	86.9%	PST-CRM-Swin-S
CRM (Mask2Former)	Swin-B	88.0%	PST-CRM-Swin-B

KAIST Pedestrain dataset (19 classes)

Architecture	Backbone	mIOU	Weight
CRM (Mask2Former)	Swin-T	51.2%	KP-CRM-Swin-T
CRM (Mask2Former)	Swin-S	54.4%	KP-CRM-Swin-S
CRM (Mask2Former)	Swin-B	55.2%	KP-CRM-Swin-B

License

Shield:

Our code is licensed under a MIT License.

Citation

Please cite the following paper if you use our work in your research.

    @article{shin2023comp,
      title={Complementary Random Maksing for RGB-T Semantic Segmentation},
      author={Shin, Ukcheol and Lee, Kyunghyun and Kweon, In So},
      journal={Arxiv pre-print},
      year={2023}
    }

Related projects & Acknowledgement

Our network architecture and codebase are built upon Mask2Former.

Mask2Former (CVPR 2022)

We use the evaluation metric provided by RTFNet.

RTFNet (RA-L 2019)

About

Official implementation of the paper "Complementary Random Masking for RGB-T Semantic Segmentation."

MIT License

Languages

Language:Python 76.0%Language:Cuda 20.1%Language:C++ 2.2%Language:Shell 1.7%