Human-object Relation Network for Action Recognition in Still Images

Introduction

Source Codes for the ICME 2020 paper: "Human-object Relation Network for Action Recognition in Still Images".

View in this repo or IEEE Digital Library.

Surrounding object information has been widely used for action recognition. However, the relation between human and object, as an important cue, is usually ignored in the still image action recognition field. In this paper, we propose a novel approach for action recognition. The key to ours is a human-object relation module. By using the appearance as well as the spatial location of human and object, the module can compute the pair-wise relation information between human and object to enhance features for action classification and can be trained jointly with our action recognition network. Experimental results on two popular datasets demonstrate the effectiveness of the proposed approach. Moreover, our method yields the new state-of-the-art results of 92.8% and 94.6% mAP on the PASCAL VOC 2012 Action and Stanford 40 Actions datasets respectively.

Installation

This project is developed on Python 3.6 with MXNet framework.

Python Packages

mxnet==1.6.0
gluoncv==0.7.0 [optional]
pycocotools==2.0 [optional]
numpy==1.15.4
matplotlib==2.2.2
tqdm==4.23.4

The optional packages are only required if you want to detect object bounding boxes for your own dataset.

Datasets

Name	Dataset Download Link	Detected Object BBoxes
VOC 2012	Dataset Website	Dropbox OR Baidu Net Disk (PassCode: z53z)
Stanford 40	Dataset Website	Dropbox OR Baidu Net Disk (PassCode: z53z)
HICO	Dataset Website	Dropbox OR Baidu Net Disk (PassCode: z53z)

Note: For easy to use, we provide the object bounding boxes used in our paper, which are detected by Faster RCNN.

VOC 2012 dataset:
1.1 Download the dataset and extract it to ~/Data/.
1.2 Download the BBoxes and extracted it to ~/Data/VOCdevkit/VOC2012/.
Stanford 40 dataset:
2.1 Download the dataset and extract it to ~/Data/.
2.2 Download the BBoxes and extracted it to ~/Data/Stanford40/.
HICO dataset:
3.1 Download the dataset and extract it to ~/Data/.
3.2 Move all images in ~/Data/hico/images/train2015 and ~/Data/hico/images/test2015 into its parent folder ~/Data/hico/images/.
3.3 Download the BBoxes and extracted it to ~/Data/hico/.

Training

Download the pretrained ResNet-50/101 weights, put the weights into ~/.mxnet/models/.
- ResNet Weights: Dropbox OR Baidu Net Disk (PassCode: ab4r)
Execute the shell script in ./experiments/[dataset]/, such as:
```
sh ./experiments/VOC2012/train.sh
```

Evaluation

Download the pretrained Models or prepare your trained models.
- Pretrained Models: Dropbox OR Baidu Net Disk (PassCode: kjok)
Modify the parameter file path in the test.sh below ./experiments/[dataset]/.
Execute the testing script, such as:
```
sh ./experiments/VOC2012/test.sh
```

Models & Results

Pretrained Models: Dropbox OR Baidu Net Disk (PassCode: kjok)

File Name	Dataset	Split	Backbone	mAP(%)
horelation_resnet50_v1d_voc_2012.params	VOC 2012	Val	ResNet-50	91.9
horelation_resnet50_v1d_stanford_40.params	Stanford 40	Test	ResNet-50	93.1
horelation_resnet101_v1d_stanford_40.params	Stanford 40	Test	ResNet-101	94.6
horelation_resnet50_v1d_hico.params	HICO	Test	ResNet-50	42.6

Citation

If you feel our code or models helps in your research, kindly cite our papers:

@INPROCEEDINGS{horelation,
author={Wentao Ma and Shuang Liang},
booktitle={2020 IEEE International Conference on Multimedia and Expo (ICME)},
title={Human-Object Relation Network For Action Recognition In Still Images},
year={2020}}

Disclaimer

This repository used code from MXNet, Gluon CV.

WalterMa / Human-Object-Relation-Network