guanyuezhen / MS-Former

MS-Former: Memory-Supported Transformer for Weakly Supervised Change Detection with Patch-Level Annotations (IEEE TGRS 2024)

Repository from Github https://github.comguanyuezhen/MS-FormerRepository from Github https://github.comguanyuezhen/MS-Former

MS-Former: Memory-Supported Transformer for Weakly Supervised Change Detection with Patch-Level Annotations (IEEE TGRS 2024)

This repository contains a Python implementation of our paper MS-Former.

1. Usage

  • Prepare the data:

    • Download datasets LEVIR, BCDD-BGMix, SYSU, and GVLM
    • Crop LEVIR datasets into 256x256 patches. Generate patch-level annotations for LEVIR, BCDD-BGMix, SYSU, and GVLM datasets.
    • Generate list file as ls -R ./label/* > test.txt
    • Prepare datasets into the following structure and set their path in train.py and test.py
    ├─Train
        ├─A        ...jpg/png
        ├─B        ...jpg/png
        ├─label    ...jpg/png
        └─list     ...txt
    ├─Test
        ├─A
        ├─B
        ├─label
        └─list
    
  • Prerequisites for Python:

    • Creating a virtual environment in the terminal: conda create -n MS-Former python=3.8
    • Installing necessary packages: pip install -r requirements.txt
  • Train/Test

    • sh train.sh
    • sh test.sh

2. Motivation

Performance Image

Fig. 1 Comparison of the pixel-level, image-level, and our patch-level labels for remote sensing change detection.

Performance Image

Fig. 2 Comparison of the change detection performance measured by F1 of our proposed MS-Former using patch-level labels across different patch size settings on the BCDD dataset.
Notably, as the patch size increases, the patch-level labels align more closely with image-level annotations, while decreasing patch size results in labels close to pixel-wise annotations. In this work, we observe that a slight reduction in patch size substantially enhances change detection performance. This observation suggests the potential of exploring patch-level annotations for remote sensing change detection.

3. Pipeline of the MS-Former

Performance Image

Fig. 3 A framework of the proposed MS-Former.
Initially, the bi-temporal images pass through a feature extractor to capture the temporal difference features. After that, the temporal difference features and prototypes stored in the memory bank are jointly learned by a series of bi-directional attention blocks. Finally, a patch-level supervision scheme is introduced to guide the network learning from the patch-level annotations.

4. Citation

Please cite our paper if you find the work useful:

@article{li2023ms,
    title={MS-Former: Memory-Supported Transformer for Weakly Supervised Change Detection with Patch-Level Annotations},
    author={Li, Zhenglai and Tang, Chang and Liu, Xinwang and Li, Changdong and Li, Xianju and Zhang, Wei},
    journal={arXiv preprint arXiv:2311.09726},
    year={2023}
}

About

MS-Former: Memory-Supported Transformer for Weakly Supervised Change Detection with Patch-Level Annotations (IEEE TGRS 2024)


Languages

Language:Python 96.5%Language:Shell 3.5%