Switchable Normalization for Semantic Segmentation

This repository contains the code of using Swithable Normalization (SN) in semantic image segmentation, proposed by the paper "Differentiable Learning-to-Normalize via Switchable Normalization".

This is a re-implementation of the experiments presented in the above paper by using open-source semantic segmentation framework Scene Parsing on MIT ADE20K.

Update

2018/9/26: The code and trained models of semantic segmentation on ADE20K by using SN are released !
More results and models will be released soon.

Citation

You are encouraged to cite the following paper if you use SN in research or wish to refer to the baseline results.

@article{SwitchableNorm,
  title={Differentiable Learning-to-Normalize via Switchable Normalization},
  author={Ping Luo and Jiamin Ren and Zhanglin Peng},
  journal={arXiv:1806.10779},
  year={2018}
}

Getting Started

Use git to clone this repository:

git clone https://github.com/switchablenorms/SwitchNorm_Segmentation.git

Environment

The code is tested under the following configurations.

Hardware: 1-8 GPUs (with at least 12G GPU memories)
Software: CUDA 9.0, Python 3.6, PyTorch 0.4.0, tensorboardX

Installation & Data Preparation

Please check the Environment, Training and Evaluation subsection in the repo Scene Parsing on MIT ADE20K for a quick start.

Pre-trained Models

Download SN based ImageNet pretrained model and put them into the {repo_root}/pretrained_sn.

ImageNet pre-trained models

The backbone models with SN pretrained on ImageNet are available in the format used by above Segmentation Framework and this repo.

ResNet50v1+SN(8,2) [pretrained_SN(8,2)]

For more pretrained models with SN, please refer to the repo of switchablenorms/Switchable-Normalization. The following script converts the model trained from Switchable-Normalization into a valid format used by the semantic segmentation codebase : ./pretrained_sn/convert_sn.py

usage: python -u convert_sn.py

NOTE: The paramater keys in pretrained model checkpoint must match the keys in backbone model EXACTLY. You should load the correct pretrained model according to your segmentation architechure.

Training

The training strategies of baseline models and sn-based models on ADE20K are same as Scene Parsing on MIT ADE20K.
The training script with ResNet-50-sn backbone can be found here: ./scripts/train.sh

NOTE: The default architecture of this repo is Encoder: resnet50_dilated8 ( resnetXX_dilatedYY: customized resnetXX with dilated convolutions, output feature map is 1/YY of input size, see DeepLab for more details ) and Decoder: c1_bilinear_deepsup ( 1 conv + bilinear upsample + deep supervision, see PSPNet for more details ).

Optional arguments (see full input arguments via ./train.py):

  --arch_encoder         architecture of encode network
  --arch_decoder         architecture of decode network
  --weights_encoder      weights to finetune endoce network
  --weights_decoder      weights to finetune decode network
  --list_train           the list to load the training data 
  --root_dataset         the path of the dataset
  --batch_size_per_gpu   input batch size
  --start_epoch          epoch to start training. (continue from a checkpoint loaded via weights_encoder & weights_decoder)

NOTE: In this repo, --start_epoch allows the training to resume from the checkpoint loaded from --weights_encoder and --weights_decoder, which is generated in the training process automatically. If you want to train from scratch, you need to assign --start_epoch as 1 and set --weights_encoder and --weights_decoder to the blank value.

Evaluation

The evaluation script with ResNet-50-sn backbone can be found here : ./scripts/evaluate.sh

Optional arguments (see full input arguments via ./eval.py):

  --arch_encoder         architecture of encode network
  --arch_decoder         architecture of decode network
  --suffix               which snapshot to load
  --list_val             the list to load the validation data 
  --root_dataset         the path of the dataset
  --imgSize              list of input image sizes

--imgSize enables single-scale or multi-scale inference. When --load_dir is with the int type, the single-scale inference will be started up. When --load_dir is a int list, the multi-scale test will be applied.

Main Results

Semantic Segmentation Results on ADE20K

The experiment results are on the ADE20K validation set. MS test is short for multi-scale test. sync BN indicates the mutli-GPU synchronization batch normalization. More results and models will be released soon.

Architecture	Norm	MS test	Mean IoU	Pixel ACC.	Overall Score	Download
ResNet50_dilated8 + c1_bilinear_deepsup	sync BN	no	36.43	77.30	56.87	encoder \| decoder
ResNet50_dilated8 + c1_bilinear_deepsup	GN	no	35.66	77.24	56.45	encoder \| decoder
ResNet50_dilated8 + c1_bilinear_deepsup	SN-(8,2)	no	38.72	78.90	58.82	encoder \| decoder

ResNet50_dilated8 + c1_bilinear_deepsup	sync BN	yes	37.69	78.29	57.99	--
ResNet50_dilated8 + c1_bilinear_deepsup	GN	yes	36.32	77.77	57.05	--
ResNet50_dilated8 + c1_bilinear_deepsup	SN-(8,2)	yes	39.21	79.20	59.21	--

NOTE: For all settings in this repo, we employ ResNet as the backbone network, using the original 7×7 kernel size in the first convolution layer. This is different from the MIT framework , which adopts 3 convolution layers with the kernel size 3×3 at the bottom of the network. See ./models/resnet_v1_sn.py for the details.

ruixuejianfei / SwitchNorm_Segmentation