The official implementation of our ICIP22' paper "Object-Aware-Self-Supervised-Multi-Label-Learning".
- Please cite our paper if the code is helpful to your research. arxiv
@misc{https://doi.org/10.48550/arxiv.2205.07028,
doi = {10.48550/ARXIV.2205.07028},
url = {https://arxiv.org/abs/2205.07028},
author = {Kaixin, Xu and Liyang, Liu and Ziyuan, Zhao and Zeng, Zeng and Veeravalli, Bharadwaj},
keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Object-Aware Self-supervised Multi-Label Learning},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}
Multi-label Learning on Image data has been widely exploited with deep learning models. However, supervised training on deep CNN models often cannot discover sufficient discriminative features for classification. As a result, numerous self-supervision methods are proposed to learn more robust image representations. However, most self-supervised approaches focus on single-instance single-label data and fall short on more complex images with multiple objects. Therefore, we propose an Object-Aware Self Supervision (OASS) method to obtain more fine-grained representations for multi-label learning, dynamically generating auxiliary tasks based on object locations. Secondly, the robust representation learned by OASS can be leveraged to efficiently generate Class-Specific Instances (CSI) in a proposal-free fashion to better guide multi-label supervision signal transfer to instances. Extensive experiments on the VOC2012 dataset for multi-label classification demonstrate the effectiveness of the proposed method against the state-of-the-art counterparts.
- Python 3.8, PyTorch 1.7.0, and more in requirements.txt
- CUDA 10.1, cuDNN 7.6.5
- 4 x Titan RTX GPUs
We have 4 key point detecting strategies: Max
, Channel-wise Max (cMax)
, Channel-wise Top-k (cTopk)
and Channel-wise weighted Top-k (cTopk-w)
. Besides, we extract patch-level features using the Exponential Moving Average (EMA), so that the patch-level features generated by a self-ensembled teacher is more stable and accurate.
python3 -m pip install -r requirements.txt
Follow instructions in http://host.robots.ox.ac.uk/pascal/VOC/voc2012/#devkit
For test dataset, we use the dataset indicated in VOC2012 website instead of the test dataset in Puzzle-CAM. (./data/test.txt)
We perform self-supervision on the merged CAM.
python3 train_classification_with_puzzle_max.py --architecture resnet50 --alpha_schedule_re 0.05 --alpha_re 3.00 --alpha_p 0.5 --tag ResNet50_Max --data_dir $your_dir
Instead of performing self-supervision on the merged CAM, we apply such a strategy to each channel of CAM.
python3 train_classification_with_puzzle_cMax_ema.py --architecture resnet50 --alpha_schedule_re 0.05 --alpha_re 3.00 --alpha_p 0.5 --tag ResNet50_cMax --data_dir $your_dir
Similar to cMax, for each CAM channel, we select k local maxima on CAM with the top CAM values and calculate the keypoint as theirgeometric center.
python3 train_classification_with_puzzle_cTopk_ema.py --architecture resnet50 --alpha_schedule_re 0.05 --alpha_re 3.00 --alpha_p 0.5 --tag ResNet50_cTopk --data_dir $your_dir
Similar to cTopk, we calculate the geometric center by weighted averaging, where the weights are the corresponding normalized CAM responses.
python3 train_classification_with_puzzle_cTopk_w_ema.py --architecture resnet50 --alpha_schedule_re 0.05 --alpha_re 3.00 --alpha_p 0.5 --tag ResNet50_cTopk_w --data_dir $your_dir
We conducted extensive experiments on VOC2012 benchmark dataset to demonstrate our advances in Multi-label Classification. The specific results, please refer to our paper.
This implementation is developed upon PuzzleCAM. We thanks its authors for effort and providing wonderful codebase!
For any issues, please check the issue column for any existing discussions before opening an issue. Or you can contact Xu_Kaixin@i2r.a-star.edu.sg.