WalterMa / Human-Object-Relation-Network

Source Codes for "Human-object Relation Network for Action Recogntion in Still Images"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Human-object Relation Network for Action Recognition in Still Images

Introduction

Source Codes for the ICME 2020 paper: "Human-object Relation Network for Action Recognition in Still Images".

View in this repo or IEEE Digital Library.

Surrounding object information has been widely used for action recognition. However, the relation between human and object, as an important cue, is usually ignored in the still image action recognition field. In this paper, we propose a novel approach for action recognition. The key to ours is a human-object relation module. By using the appearance as well as the spatial location of human and object, the module can compute the pair-wise relation information between human and object to enhance features for action classification and can be trained jointly with our action recognition network. Experimental results on two popular datasets demonstrate the effectiveness of the proposed approach. Moreover, our method yields the new state-of-the-art results of 92.8% and 94.6% mAP on the PASCAL VOC 2012 Action and Stanford 40 Actions datasets respectively.

Installation

This project is developed on Python 3.6 with MXNet framework.

Python Packages

mxnet==1.6.0
gluoncv==0.7.0 [optional]
pycocotools==2.0 [optional]
numpy==1.15.4
matplotlib==2.2.2
tqdm==4.23.4

The optional packages are only required if you want to detect object bounding boxes for your own dataset.

Datasets

Name Dataset Download Link Detected Object BBoxes
VOC 2012 Dataset Website Dropbox OR Baidu Net Disk (PassCode: z53z)
Stanford 40 Dataset Website Dropbox OR Baidu Net Disk (PassCode: z53z)
HICO Dataset Website Dropbox OR Baidu Net Disk (PassCode: z53z)

Note: For easy to use, we provide the object bounding boxes used in our paper, which are detected by Faster RCNN.

  1. VOC 2012 dataset:
    1.1 Download the dataset and extract it to ~/Data/.
    1.2 Download the BBoxes and extracted it to ~/Data/VOCdevkit/VOC2012/.

  2. Stanford 40 dataset:
    2.1 Download the dataset and extract it to ~/Data/.
    2.2 Download the BBoxes and extracted it to ~/Data/Stanford40/.

  3. HICO dataset:
    3.1 Download the dataset and extract it to ~/Data/.
    3.2 Move all images in ~/Data/hico/images/train2015 and ~/Data/hico/images/test2015 into its parent folder ~/Data/hico/images/.
    3.3 Download the BBoxes and extracted it to ~/Data/hico/.

Training

  1. Download the pretrained ResNet-50/101 weights, put the weights into ~/.mxnet/models/.
  2. Execute the shell script in ./experiments/[dataset]/, such as:
    sh ./experiments/VOC2012/train.sh
    

Evaluation

  1. Download the pretrained Models or prepare your trained models.
  2. Modify the parameter file path in the test.sh below ./experiments/[dataset]/.
  3. Execute the testing script, such as:
    sh ./experiments/VOC2012/test.sh
    

Models & Results

Pretrained Models: Dropbox OR Baidu Net Disk (PassCode: kjok)

File Name Dataset Split Backbone mAP(%)
horelation_resnet50_v1d_voc_2012.params VOC 2012 Val ResNet-50 91.9
horelation_resnet50_v1d_stanford_40.params Stanford 40 Test ResNet-50 93.1
horelation_resnet101_v1d_stanford_40.params Stanford 40 Test ResNet-101 94.6
horelation_resnet50_v1d_hico.params HICO Test ResNet-50 42.6

Citation

If you feel our code or models helps in your research, kindly cite our papers:

@INPROCEEDINGS{horelation,
author={Wentao Ma and Shuang Liang},
booktitle={2020 IEEE International Conference on Multimedia and Expo (ICME)},
title={Human-Object Relation Network For Action Recognition In Still Images},
year={2020}}

Disclaimer

This repository used code from MXNet, Gluon CV.

About

Source Codes for "Human-object Relation Network for Action Recogntion in Still Images"

License:MIT License


Languages

Language:Python 99.7%Language:Shell 0.3%