RMSIN

This repository is the offical implementation for "Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation."

Setting Up

Preliminaries

The code has been verified to work with PyTorch v1.7.1 and Python 3.7.

Clone this repository.
Change directory to root of this repository.

Package Dependencies

Create a new Conda environment with Python 3.7 then activate it:

conda create -n RMSIN python==3.7
conda activate RMSIN

Install PyTorch v1.7.1 with a CUDA version that works on your cluster/machine (CUDA 10.2 is used in this example):

conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=10.2 -c pytorch

Install the packages in requirements.txt via pip:

pip install -r requirements.txt

The Initialization Weights for Training

Create the ./pretrained_weights directory where we will be storing the weights.

mkdir ./pretrained_weights

Download pre-trained classification weights of the Swin Transformer, and put the pth file in ./pretrained_weights. These weights are needed for training to initialize the model.

Datasets

We perform all experiments on our proposed dataset RRSIS-D. RRSIS-D is a new Referring Remote Sensing Image Segmentation benchmark which containes 17,402 image-caption-mask triplets. It can be downloaded from Google Drive or Baidu Netdisk (access code: sjoe).

Usage

Download our dataset.
Copy all the downloaded files to ./refer/data/. The dataset folder should be like this:

$DATA_PATH
├── rrsisd
│   ├── refs(unc).p
│   ├── instances.json
└── images
    └── rrsisd
        ├── JPEGImages
        ├── ann_split

Training

We use DistributedDataParallel from PyTorch for training. To run on 4 GPUs (with IDs 0, 1, 2, and 3) on a single node:

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node 4 --master_port 12345 train.py --dataset rrsisd --model_id RMSIN --epochs 40 --img_size 480 2>&1 | tee ./output

Testing

python test.py --swin_type base --dataset rrsisd --resume ./your_checkpoints_path --split val --workers 4 --window12 --img_size 480

Acknowledgements

Code in this repository is built on LAVT. We'd like to thank the authors for open sourcing their project.

Lsan2401 / RMSIN