This is an implementation of PSPNet in TensorFlow for semantic segmentation on the cityscapes dataset. We first convert weight from Original Code by using caffe-tensorflow framework.
Support evaluation code for ade20k dataset
Support inference phase for ade20k dataset
using model of pspnet50 (convert weights from original author)- Using
tf.matmul
to decode label, so as to improve the speed of inference.
Support different input size
by padding input image to (720, 720) if original size is smaller than it, and get result by cropping image in the end.
Change bn layer from tf.nn.batch_normalization
into tf.layers.batch_normalization
in order to support training phase. Also update initial model in Google Drive.
Get restore checkpoint from Google Drive and put into model
directory. Note: Select the checkpoint corresponding to the dataset.
To get result on your own images, use the following command:
python inference.py --img-path=./input/test.png --dataset cityscapes
Inference time: ~0.6s
Options:
--dataset cityscapes or ade20k
--flipped-eval
--checkpoints /PATH/TO/CHECKPOINT_DIR
Perform in single-scaled model on the cityscapes validation datase.
Method | Accuracy |
---|---|
Without flip | 76.99% |
Flip | 77.23% |
Method | Accuracy |
---|---|
Without flip | 40.00% |
Flip | 40.67% |
To re-produce evluation results, do following steps:
- Download Cityscape dataset or ADE20k dataset first.
- change
data_dir
to your dataset path inevaluate.py
:
'data_dir': ' = /Path/to/dataset'
- Run the following command:
python evaluate.py --dataset cityscapes
List of Args:
--dataset - ade20k or cityscapes
--flipped-eval - Using flipped evaluation method
--measure-time - Calculate inference time
Input image | Output image |
---|---|
Input image | Output image |
---|---|
Input image | Output image |
---|---|
@article{zhao2017pspnet,
author = {Hengshuang Zhao and
Jianping Shi and
Xiaojuan Qi and
Xiaogang Wang and
Jiaya Jia},
title = {Pyramid Scene Parsing Network},
booktitle = {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2017}
}
Scene Parsing through ADE20K Dataset. B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso and A. Torralba. Computer Vision and Pattern Recognition (CVPR), 2017. (http://people.csail.mit.edu/bzhou/publication/scene-parse-camera-ready.pdf)
@inproceedings{zhou2017scene,
title={Scene Parsing through ADE20K Dataset},
author={Zhou, Bolei and Zhao, Hang and Puig, Xavier and Fidler, Sanja and Barriuso, Adela and Torralba, Antonio},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
year={2017}
}
Semantic Understanding of Scenes through ADE20K Dataset. B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso and A. Torralba. arXiv:1608.05442. (https://arxiv.org/pdf/1608.05442.pdf)
@article{zhou2016semantic,
title={Semantic understanding of scenes through the ade20k dataset},
author={Zhou, Bolei and Zhao, Hang and Puig, Xavier and Fidler, Sanja and Barriuso, Adela and Torralba, Antonio},
journal={arXiv preprint arXiv:1608.05442},
year={2016}
}