This repository maintains the official implementation of the paper Learning to Aggregate Multi-Scale Context for Instance Segmentation in Remote Sensing Images by Ye Liu, Huifang Li, Chao Hu, Shuang Luo, Yan Luo, and Chang Wen Chen.
Please refer to the following environmental settings that we use. You may install these packages by yourself if you meet any problem during automatic installation.
- CUDA 10.2 Update 2
- CUDNN 8.0.5.39
- Python 3.9.7
- PyTorch 1.10.0
- MMCV 1.3.17
- MMDetection 2.18.1
- NNCore 0.3.2
- Clone the repository from GitHub.
git clone https://github.com/yeliudev/CATNet.git
cd CATNet
- Install dependencies.
pip install -r requirements.txt
- Download and extract the datasets.
Note that the images in iSAID dataset are splitted into patches with both sides no more than 512 pixels, as reported in our paper. We strongly recommend using this pre-processed version directly since the offical toolkit has known unknown bugs, leading to undesirable patch sizes (e.g. extreme aspect ratios).
- Prepare the files in the following structure.
CATNet
βββ configs
βββ datasets
βββ models
βββ tools
βββ data
β βββ dior
β β βββ Annotations
β β βββ ImageSets
β β βββ JPEGImages-test
β β βββ JPEGImages-trainval
β βββ hrsid
β β βββ annotations
β β βββ images
β βββ isaid
β β βββ annotations
β β βββ train
β β βββ val
β βββ vhr
β βββ ground truth
β βββ positive image set
βββ README.md
βββ setup.cfg
βββ Β·Β·Β·
- Convert DIOR annotations to PASCAL VOC format.
python tools/convert_dior.py
- Convert NWPU VHR-10 annotations to COCO format.
python tools/convert_vhr.py
Run the following command to train a model using a specified config.
torchrun --nproc_per_node=4 tools/train.py <path-to-config>
Run the following command to test a model and evaluate results.
torchrun --nproc_per_node=4 tools/test.py <path-to-config> <path-to-checkpoint>
We provide multiple pre-trained models here. All the models are trained using 4 NVIDIA Tesla V100-SXM2 GPUs and are evaluated using the default metrics of the datasets.
Dataset | Model | Backbone | Schd | Aug | Performance | Download | |
---|---|---|---|---|---|---|---|
BBox AP | Mask AP | ||||||
iSAID | CAT Mask R-CNN | ResNet-50 | 1x | β | 46.2 | 38.5 | model | metrics |
CAT Mask R-CNN | ResNet-50 | 1x | β | 47.6 | 40.1 | model | metrics | |
DIOR | CATNet | ResNet-50 | 3x | β | 76.3 | β | model | metrics |
CATNet | ResNet-50 | 3x | β | 78.6 | β | model | metrics | |
CAT R-CNN | ResNet-50 | 3x | β | 77.7 | β | model | metrics | |
CAT R-CNN | ResNet-50 | 3x | β | 81.9 | β | model | metrics | |
NWPU VHR-10 |
CATNet | ResNet-50 | 6x | β | 95.8 | β | model | metrics |
CATNet | ResNet-50 | 6x | β | 97.4 | β | model | metrics | |
CAT R-CNN | ResNet-50 | 6x | β | 96.4 | β | model | metrics | |
CAT R-CNN | ResNet-50 | 6x | β | 97.7 | β | model | metrics | |
HRSID | CAT Mask R-CNN | ResNet-50 | 3x | β | 71.7 | 58.2 | model | metrics |
CAT Mask R-CNN | ResNet-50 | 3x | β | 73.3 | 59.6 | model | metrics | |
CAT R-CNN | ResNet-50 | 3x | β | 70.5 | β | model | metrics | |
CAT R-CNN | ResNet-50 | 3x | β | 72.8 | β | model | metrics |
If you find this project useful for your research, please kindly cite our paper.
@techreport{liu2021learning,
title={Learning to Aggregate Multi-Scale Context for Instance Segmentation in Remote Sensing Images},
author={Liu, Ye and Li, Huifang and Hu, Chao and Luo, Shuang and Luo, Yan and Chen, Chang Wen},
number={arXiv:2111.11057},
year={2021}
}