LCTR: On Awakening the Local Continuity of Transformer for Weakly Supervised Object Localization (AAAI 2022)

PyTorch implementation of ''LCTR: On Awakening the Local Continuity of Transformer for Weakly Supervised Object Localization''.

📋 Table of content

📎 Paper Link
💡 Abstract
📖 Method
📃 Requirements
✏️ Usage
🔍 Citation
❤️ Acknowledgement

📎 Paper Link

LCTR: On Awakening the Local Continuity of Transformer for Weakly Supervised Object Localization (link)

Authors: Zhiwei Chen, Changan Wang, Yabiao Wang, Guannan Jiang, Yunhang Shen, Ying Tai, Chengjie Wang, Wei Zhang, Liujuan Cao
Institution: Xiamen University, Xiamen, China. Tencent Youtu Lab, Shanghai, China. CATL, China

💡 Abstract

Weakly supervised object localization (WSOL) aims to learn object localizer solely by using image-level labels. The convolution neural network (CNN) based techniques often result in highlighting the most discriminative part of objects while ignoring the entire object extent. Recently, the transformer architecture has been deployed to WSOL to capture the long-range feature dependencies with self-attention mechanism and multilayer perceptron structure. Nevertheless, transformers lack the locality inductive bias inherent to CNNs and therefore may deteriorate local feature details in WSOL. In this paper, we propose a novel framework built upon the transformer, termed LCTR (Local Continuity TRansformer), which targets at enhancing the local perception capability of global features among long-range feature dependencies. To this end, we propose a relational patch-attention module (RPAM), which considers cross-patch information on a global basis. We further design a cue digging module (CDM), which utilizes local features to guide the learning trend of the model for highlighting the weak local responses. Finally, comprehensive experiments are carried out on two widely used datasets, ie, CUB-200-2011 and ILSVRC, to verify the effectiveness of our method.

📖 Method

Overview of the proposed LCTR, which consists of vision transformer backbone for feature extraction, relational patch-attention module (RPAM) and cue digging module (CDM).

📃 Requirements

PyTorch==1.7.0
torchvision==0.8.1
timm==0.3.2

✏️ Usage

Start

git clone https://github.com/zhiweichen0012/LCTR.git
cd LCTR

Prepare Datasets

The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train/ folder and val folder respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class/2
      img4.jpeg

Training

To train LCTR on CUB with 4 GPUs run:

python3 -m torch.distributed.launch --nproc_per_node=4 --master_port=11240 --use_env main.py --model deit_small_lctr --batch-size 128 --data-path /path/to/CUB --output_dir /path/to/save --resume /path/to/weights/deit_small_patch16_224.pth --epochs 80 --data-set CUB --lr 5e-5

To train LCTR on ILSVRC with 4 GPUs run:

python3 -m torch.distributed.launch --nproc_per_node=4 --master_port=11240 --use_env main.py --model deit_small_lctr --batch-size 256 --data-path /path/to/IMNET --output_dir /path/to/save --resume /path/to/weights/deit_small_patch16_224.pth --epochs 14 --data-set IMNET --lr 5e-4

Inference

To test the CUB models, you can run:

python3 main.py --eval --model deit_small_lctr --data-set CUB --data-path /path/to/CUB --resume ./path/to/LCTR_CUB_model

To test the ILSVRC models, you can run:

python3 main.py --eval --model deit_small_lctr --data-set IMNET --data-path /path/to/IMNET --resume ./path/to/LCTR_IMNET_model

🔍 Citation

@inproceedings{chen2022lctr,
  title={Lctr: On awakening the local continuity of transformer for weakly supervised object localization},
  author={Chen, Zhiwei and Wang, Changan and Wang, Yabiao and Jiang, Guannan and Shen, Yunhang and Tai, Ying and Wang, Chengjie and Zhang, Wei and Cao, Liujuan},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={36},
  number={1},
  pages={410--418},
  year={2022}
}

❤️ Acknowledgement

We use deit and their pre-trained weights as the backbone. Many thanks to their brilliant works!

zhiweichen0012 / LCTR