Swin Transformer for Semantic Segmentaion in Offroad datasets

This repo contains the supported code and configuration files to reproduce semantic segmentaion results of Swin Transformer. on the custom dataset of offroad enviornment- RELLIS3D dataset. It is based on mmsegmentaion.

Results and Models

ADE20K

Backbone	Method	Crop Size	Lr Schd	mIoU	mIoU (ms+flip)	#params	FLOPs	config	log	model
Swin-T	UPerNet	512x512	160K	44.51	45.81	60M	945G	config	github/baidu	github/baidu
Swin-S	UperNet	512x512	160K	47.64	49.47	81M	1038G	config	github/baidu	github/baidu
Swin-B	UperNet	512x512	160K	48.13	49.72	121M	1188G	config	github/baidu	github/baidu

Notes:

Download the Pre-trained models from Swin Transformer for ImageNet Classification. This will be used to fine-tune RELLIS3D dataset.
Use small (S) or Tiney (T) model, which will be easy to compute.
The learning rate needs to be tuned for best practice to se what works best for RELLIS3D.

Usage

Installation

Please refer to get_started.md for installation and dataset preparation. Follow steps a, b, c, and d or just this https://github.com/open-mmlab/mmsegmentation/blob/master/docs/get_started.md#linux. This is to set up mmsegmentation with conda enviornment.

Inference

# single-gpu testing
python tools/test.py <CONFIG_FILE> <SEG_CHECKPOINT_FILE> --eval mIoU

# multi-gpu testing
tools/dist_test.sh <CONFIG_FILE> <SEG_CHECKPOINT_FILE> <GPU_NUM> --eval mIoU

# multi-gpu, multi-scale testing
tools/dist_test.sh <CONFIG_FILE> <SEG_CHECKPOINT_FILE> <GPU_NUM> --aug-test --eval mIoU

Training

To train with pre-trained models, run:

# single-gpu training
python tools/train.py <CONFIG_FILE> --options model.pretrained=<PRETRAIN_MODEL> [model.backbone.use_checkpoint=True] [other optional arguments]

# multi-gpu training
tools/dist_train.sh <CONFIG_FILE> <GPU_NUM> --options model.pretrained=<PRETRAIN_MODEL> [model.backbone.use_checkpoint=True] [other optional arguments]

For example, to train an UPerNet model with a Swin-T backbone and 8 gpus, run:

python tools/train.py configs/swin/upernet_swin_base_patch4_window7_512x512_160k_messidor.py --options model.pretrained=weights/<your path of the pretrain you downloaded from the above steps>.pth

Notes:

use_checkpoint is used to save GPU memory. Please refer to this page for more details.
The default learning rate and training schedule is for 8 GPUs and 2 imgs/gpu.

Citing Swin Transformer

@article{liu2021Swin,
  title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
  author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
  journal={arXiv preprint arXiv:2103.14030},
  year={2021}
}

About

This is taken from the official implementation of "Swin Transformer" on Semantic Segmentation to implement on Offroad environments

https://arxiv.org/abs/2103.14030

Apache License 2.0

Languages

Language:Python 99.7%Language:Shell 0.2%Language:Dockerfile 0.1%