Context Aggregation Network

This repository maintains the official implementation of the paper Learning to Aggregate Multi-Scale Context for Instance Segmentation in Remote Sensing Images by Ye Liu, Huifang Li, Chao Hu, Shuang Luo, Yan Luo, and Chang Wen Chen.

Installation

Please refer to the following environmental settings that we use. You may install these packages by yourself if you meet any problem during automatic installation.

CUDA 10.2 Update 2
CUDNN 8.0.5.39
Python 3.9.7
PyTorch 1.10.0
MMCV 1.3.17
MMDetection 2.18.1
NNCore 0.3.2

Install from source

Clone the repository from GitHub.

git clone https://github.com/yeliudev/CATNet.git
cd CATNet

Install dependencies.

pip install -r requirements.txt

Getting Started

Download and prepare the datasets

Download and extract the datasets.

Note that the images in iSAID dataset are splitted into patches with both sides no more than 512 pixels, as reported in our paper. We strongly recommend using this pre-processed version directly since the offical toolkit has known unknown bugs, leading to undesirable patch sizes (e.g. extreme aspect ratios).

Prepare the files in the following structure.

CATNet
├── configs
├── datasets
├── models
├── tools
├── data
│   ├── dior
│   │   ├── Annotations
│   │   ├── ImageSets
│   │   ├── JPEGImages-test
│   │   └── JPEGImages-trainval
│   ├── hrsid
│   │   ├── annotations
│   │   └── images
│   ├── isaid
│   │   ├── annotations
│   │   ├── train
│   │   └── val
│   └── vhr
│       ├── ground truth
│       └── positive image set
├── README.md
├── setup.cfg
└── ···

Convert DIOR annotations to PASCAL VOC format.

python tools/convert_dior.py

Convert NWPU VHR-10 annotations to COCO format.

python tools/convert_vhr.py

Train a model

Run the following command to train a model using a specified config.

torchrun --nproc_per_node=4 tools/train.py <path-to-config>

Test a model and evaluate results

Run the following command to test a model and evaluate results.

torchrun --nproc_per_node=4 tools/test.py <path-to-config> <path-to-checkpoint>

Model Zoo

We provide multiple pre-trained models here. All the models are trained using 4 NVIDIA Tesla V100-SXM2 GPUs and are evaluated using the default metrics of the datasets.

Dataset	Model	Backbone	Schd	Aug	Performance		Download
Dataset	Model	Backbone	Schd	Aug	BBox AP	Mask AP	Download
iSAID	CAT Mask R-CNN	ResNet-50	1x	✗	46.2	38.5	model \| metrics
iSAID	CAT Mask R-CNN	ResNet-50	1x	✓	47.6	40.1	model \| metrics
DIOR	CATNet	ResNet-50	3x	✗	76.3	—	model \| metrics
	CATNet	ResNet-50	3x	✓	78.6	—	model \| metrics
	CAT R-CNN	ResNet-50	3x	✗	77.7	—	model \| metrics
	CAT R-CNN	ResNet-50	3x	✓	81.9	—	model \| metrics
NWPU VHR-10	CATNet	ResNet-50	6x	✗	95.8	—	model \| metrics
	CATNet	ResNet-50	6x	✓	97.4	—	model \| metrics
	CAT R-CNN	ResNet-50	6x	✗	96.4	—	model \| metrics
	CAT R-CNN	ResNet-50	6x	✓	97.7	—	model \| metrics
HRSID	CAT Mask R-CNN	ResNet-50	3x	✗	71.7	58.2	model \| metrics
	CAT Mask R-CNN	ResNet-50	3x	✓	73.3	59.6	model \| metrics
	CAT R-CNN	ResNet-50	3x	✗	70.5	—	model \| metrics
	CAT R-CNN	ResNet-50	3x	✓	72.8	—	model \| metrics

Citation

If you find this project useful for your research, please kindly cite our paper.

@techreport{liu2021learning,
  title={Learning to Aggregate Multi-Scale Context for Instance Segmentation in Remote Sensing Images},
  author={Liu, Ye and Li, Huifang and Hu, Chao and Luo, Shuang and Luo, Yan and Chen, Chang Wen},
  number={arXiv:2111.11057},
  year={2021}
}

About

🛰️ Learning to Aggregate Multi-Scale Context for Instance Segmentation in Remote Sensing Images (arXiv 2021)

https://arxiv.org/abs/2111.11057

GNU General Public License v3.0

Languages

Language:Python 100.0%