zhihou7/HOI-CL-OneStage

Affordance Transfer Learning for Human-Object Interaction Detection (CVPR2021), Visual Compositional Learning for Human-Object Interaction Detection (ECCV2020)

This is an implementation of ATL and VCL based on One-Stage HOI Detection.

Getting Started

Prerequisites

Linux or macOS with Python ≥ 3.6
PyTorch ≥ 1.4, torchvision that matches the PyTorch installation.
Detectron2
Other packages listed in reuirements.txt

Installation

Please follow the instructions to install detectron2 first.
Install other dependencies by pip install -r requirements.txt or conda install --file requirements.txt
Download and prepare the data by cd datasets; sh prepare_data.sh.
- The HICO-DET dataset and V-COCO dataset.
  - If you already have, please comment out the corresponding lines in prepare_data.sh and hard-code the dataset path using your custom path in lib/data/datasets/builtin.py.
- COCO's format annotations for HICO-DET and VCOCO dataset.
- Glove semantic embeddings.
COCO data for hico: https://drive.google.com/file/d/18BJO8sG8KU3cmSQxpOqcPOql7sXukvIj/view?usp=sharing

Demo Inference with Pre-trained Model

Object Detection Pretrained model (Here is the model from VCL): https://cloudstor.aarnet.edu.au/plus/s/NSkxIqfWMt9VydN

ATL model: https://drive.google.com/file/d/1qyA3KSiIvlsDgPP6Qc9ekbiI4IZmIsIw/view?usp=sharing

Training a model and running inference

Train Baseline (ATL)

python train_net_atl.py --num-gpus 2 --config-file configs/HICO-DET/interaction_R_101_FPN_pos_atl.yaml MODEL.ROI_HEADS.OBJ_IMG_NUMS 2 SOLVER.IMS_PER_BATCH 4 OUTPUT_DIR ./output/HICO_interaction_base_101_fine1_gpu2_atl12 MODEL.ROI_HEADS.CL 0 MODEL.ROI_HEADS.CL_WEIGHT 0.25 MODEL.WEIGHTS output/model_0064999.pth

Train ATL

python train_net_atl.py --num-gpus 2 --config-file configs/HICO-DET/interaction_R_101_FPN_pos_atl.yaml MODEL.ROI_HEADS.OBJ_IMG_NUMS 2 SOLVER.IMS_PER_BATCH 4 OUTPUT_DIR ./output/HICO_interaction_base_101_fine1_gpu2_atl12 MODEL.ROI_HEADS.CL 1 MODEL.ROI_HEADS.CL_WEIGHT 0.25 MODEL.WEIGHTS output/model_0064999.pth

Train VCL

python train_net.py --num-gpus 2 --config-file configs/HICO-DET/interaction_R_101_FPN_pos.yaml SOLVER.IMS_PER_BATCH 4 OUTPUT_DIR ./output/HICO_interaction_base_101_fine1_gpu2_vcl MODEL.ROI_HEADS.CL 1 MODEL.ROI_HEADS.CL_WEIGHT 0.25 MODEL.WEIGHTS output/model_0064999.pth

Eval model

python train_net.py --eval-only --num-gpus 1 --config-file configs/HICO-DET/interaction_R_101_FPN_pos_atl.yaml  OUTPUT_DIR ./output/HICO_interaction_base_101_fine1_gpu2_atl12 MODEL.WEIGHTS ./output/HICO_interaction_base_101_fine1_gpu2_atl12/model_00239999.pth

Results

Model	Full	Rare	Non-Rare
ATL	23.81	17.43	25.72

Citations

If you find this series of work are useful for you, please consider citing:

@inproceedings{hou2021fcl,
  title={Detecting Human-Object Interaction via Fabricated Compositional Learning},
  author={Hou, Zhi and Baosheng, Yu and Qiao, Yu and Peng, Xiaojiang and Tao, Dacheng},
  booktitle={CVPR},
  year={2021}
}

@inproceedings{hou2021vcl,
  title={Visual Compositional Learning for Human-Object Interaction Detection},
  author={Hou, Zhi and Peng, Xiaojiang and Qiao, Yu  and Tao, Dacheng},
  booktitle={ECCV},
  year={2020}
}

@inproceedings{hou2021atl,
  title={Affordance Transfer Learning for Human-Object Interaction Detection},
  author={Hou, Zhi and Baosheng, Yu and Qiao, Yu and Peng, Xiaojiang and Tao, Dacheng},
  booktitle={CVPR},
  year={2021}
}

Acknowledgements

Code is built from zero_shot_hoi and Detectron2.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

zhihou7 / HOI-CL-OneStage