ericzw / ED-Pose

The official repo for [ICLR'23] "Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation "

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ED-Pose

PWC

This is the official pytorch implementation of our ICLR 2023 paper "Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation ".

Visualization

All videos are using per-frame estimation without temporal smoothing.

Case1: For single-person dancer pose estimation:

Case2: For self-occlusion scene:

Case3: For occlusion scenes with extended keypoint prediction beyond the image thanks to the regression loss:

Case4: For fast-moving scenes:

Introduction

We present ED-Pose, an end-to-end framework with Explicit box Detection for multi-person Pose estimation. ED-Pose re-considers this task as two explicit box detection processes with a unified representation and regression supervision. In general, ED-Pose is conceptually simple without post-processing and dense heatmap supervision.

  1. For the first time, ED-Pose, as a fully end-to-end framework with a L1 regression loss, surpasses heatmap-based Top-down methods under the same backbone by 1.2 AP on COCO.
  2. ED-Pose achieves the state-of-the-art with 76.6 AP on CrowdPose without test-time augmentation.

Methods

method

Todo

This repo contains further modifications including:

Model Zoo

We have put our model checkpoints here.

Results on COCO val2017 dataset

Model Backbone Lr schd mAP AP50 AP75 APM APL Time (ms) Download
ED-Pose R-50 60e 71.7 89.7 78.8 66.2 79.7 51 Google Drive
ED-Pose Swin-L 60e 74.3 91.5 81.7 68.5 82.7 88 Google Drive
ED-Pose Swin-L-5scale 60e 75.8 92.3 82.9 70.4 83.5 142 Google Drive

Results on CrowdPose test dataset

Model Backbone Lr schd mAP AP50 AP75 APE APM APH Download
ED-Pose R-50 80e 69.9 88.6 75.8 77.7 70.6 60.9 Google Drive
ED-Pose Swin-L 80e 73.1 90.5 79.8 80.5 73.8 63.8 Google Drive
ED-Pose Swin-L-5scale 80e 76.6 92.4 83.3 83.0 77.3 68.3 Google Drive

Results on COCO test-dev dataset

Model Backbone Loss mAP AP50 AP75 APM APL
DirectPose R-50 Reg 62.2 86.4 68.2 56.7 69.8
DirectPose R-101 Reg 63.3 86.7 69.4 57.8 71.2
FCPose R-50 Reg+HM 64.3 87.3 71.0 61.6 70.5
FCPose R-101 Reg+HM 65.6 87.9 72.6 62.1 72.3
InsPose R-50 Reg+HM 65.4 88.9 71.7 60.2 72.7
InsPose R-101 Reg+HM 66.3 89.2 73.0 61.2 73.9
PETR R-50 Reg+HM 67.6 89.8 75.3 61.6 76.0
PETR Swin-L Reg+HM 70.5 91.5 78.7 65.2 78.0
ED-Pose R-50 Reg 69.8 90.2 77.2 64.3 77.4
ED-Pose Swin-L Reg 72.7 92.3 80.9 67.6 80.0

Note:

  • Any test-time augmentations is not used for ED-Pose.
  • We use the Object365 dataset to pretrain the human detection of ED-Pose under the Swin-L-5scale setting.

Environment Setup

Installation

We use the DN-Deformable-DETR as our codebase. We test our models under python=3.7.3,pytorch=1.9.0,cuda=11.1. Other versions might be available as well.

  1. Clone this repo
git clone https://github.com/IDEA-Research/ED-Pose.git
cd ED-Pose
  1. Install Pytorch and torchvision

Follow the instruction on https://pytorch.org/get-started/locally/.

# an example:
conda install -c pytorch pytorch torchvision
  1. Install other needed packages
pip install -r requirements.txt
  1. Compiling CUDA operators
cd models/edpose/ops
python setup.py build install
# unit test (should see all checking is True)
python test.py
cd ../../..
Data Preparation

For COCO data, please download from COCO download. The coco_dir should look like this:

|-- EDPose
`-- |-- coco_dir
    `-- |-- annotations
        |   |-- person_keypoints_train2017.json
        |   `-- person_keypoints_val2017.json
        `-- images
            |-- train2017
            |   |-- 000000000009.jpg
            |   |-- 000000000025.jpg
            |   |-- 000000000030.jpg
            |   |-- ... 
            `-- val2017
                |-- 000000000139.jpg
                |-- 000000000285.jpg
                |-- 000000000632.jpg
                |-- ... 

For CrowdPose data, please download from CrowdPose download, The crowdpose_dir should look like this:

|-- ED-Pose
`-- |-- crowdpose_dir
    `-- |-- json
        |   |-- crowdpose_train.json
        |   |-- crowdpose_val.json
        |   |-- crowdpose_trainval.json (generated by util/crowdpose_concat_train_val.py)
        |   `-- crowdpose_test.json
        `-- images
            |-- 100000.jpg
            |-- 100001.jpg
            |-- 100002.jpg
            |-- 100003.jpg
            |-- 100004.jpg
            |-- 100005.jpg
            |-- ... 

Run

Training on COCO:

Single GPU
#For ResNet-50:
export EDPOSE_COCO_PATH=/path/to/your/cocodir
  python main.py \
 --output_dir "logs/coco_r50" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epoch=60 lr_drop=55 num_body_points=17 backbone='resnet50' \
 --dataset_file="coco"
#For Swin-L:
export EDPOSE_COCO_PATH=/path/to/your/cocodir
export pretrain_model_path=/path/to/your/swin_L_384_22k
  python main.py \
 --output_dir "logs/coco_swinl" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epoch=60 lr_drop=55 num_body_points=17 backbone='swin_L_384_22k' \
 --dataset_file="coco"
Distributed Run
#For ResNet-50:
export EDPOSE_COCO_PATH=/path/to/your/cocodir
  python -m torch.distributed.launch --nproc_per_node=4  main.py \
 --output_dir "logs/coco_r50" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epoch=60 lr_drop=55 num_body_points=17 backbone='resnet50' \
 --dataset_file="coco"
#For Swin-L:
export EDPOSE_COCO_PATH=/path/to/your/cocodir
export pretrain_model_path=/path/to/your/swin_L_384_22k
  python -m torch.distributed.launch --nproc_per_node=4 main.py \
 --output_dir "logs/coco_swinl" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epoch=60 lr_drop=55 num_body_points=17 backbone='swin_L_384_22k' \
 --dataset_file="coco"

Training on CrowdPose:

Single GPU
#For ResNet-50:
export EDPOSE_CrowdPose_PATH=/path/to/your/crowdpose_dir
  python main.py \
 --output_dir "logs/crowdpose_r50" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epoch=80 lr_drop=75 num_body_points=14 backbone='resnet50' \
 --dataset_file="crowdpose"
#For Swin-L:
export EDPOSE_CrowdPose_PATH=/path/to/your/crowdpose_dir
export pretrain_model_path=/path/to/your/swin_L_384_22k
  python main.py \
 --output_dir "logs/crowdpose_swinl" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epoch=80 lr_drop=75 num_body_points=14 backbone='swin_L_384_22k' \
 --dataset_file="crowdpose"
Distributed Run
#For ResNet-50:
export EDPOSE_CrowdPose_PATH=/path/to/your/crowdpose_dir
  python -m torch.distributed.launch --nproc_per_node=4  main.py \
 --output_dir "logs/crowdpose_r50" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epoch=80 lr_drop=75 num_body_points=14 backbone='resnet50' \
 --dataset_file="crowdpose"
#For Swin-L:
export EDPOSE_CrowdPose_PATH=/path/to/your/crowdpose_dir
export pretrain_model_path=/path/to/your/swin_L_384_22k
  python -m torch.distributed.launch --nproc_per_node=4 main.py \
 --output_dir "logs/crowdpose_swinl" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epoch=80 lr_drop=75 num_body_points=14 backbone='swin_L_384_22k' \
 --dataset_file="crowdpose"

We have put the Swin-L model pretrained on ImageNet-22k here.

Evaluation on COCO:

ResNet-50
export EDPOSE_COCO_PATH=/path/to/your/cocodir
  python -m torch.distributed.launch --nproc_per_node=4  main.py \
 --output_dir "logs/coco_r50" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epoch=60 lr_drop=55 num_body_points=17 backbone='resnet50' \
 --dataset_file="coco" \
 --pretrain_model_path "./models/edpose_r50_coco.pth" \
 --eval
Swin-L
export EDPOSE_COCO_PATH=/path/to/your/cocodir
export pretrain_model_path=/path/to/your/swin_L_384_22k
  python -m torch.distributed.launch --nproc_per_node=4 main.py \
 --output_dir "logs/coco_swinl" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epoch=60 lr_drop=55 num_body_points=17 backbone='swin_L_384_22k' \
 --dataset_file="coco" \
 --pretrain_model_path "./models/edpose_swinl_coco.pth" \
 --eval
Swin-L-5scale
export EDPOSE_COCO_PATH=/path/to/your/cocodir
export pretrain_model_path=/path/to/your/swin_L_384_22k
  python -m torch.distributed.launch --nproc_per_node=4 main.py \
 --output_dir "logs/coco_swinl" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epoch=60 lr_drop=55 num_body_points=17 backbone='swin_L_384_22k' \
  return_interm_indices=0,1,2,3 num_feature_levels=5 \
 --dataset_file="coco" \
 --pretrain_model_path "./models/edpose_swinl_5scale_coco.pth" \
 --eval

Evaluation on CrowdPose:

ResNet-50
export EDPOSE_CrowdPose_PATH=/path/to/your/crowdpose_dir
  python main.py \
 --output_dir "logs/crowdpose_r50" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epoch=80 lr_drop=75 num_body_points=14 backbone='resnet50' \
 --dataset_file="crowdpose"\
 --pretrain_model_path "./models/edpose_r50_crowdpose.pth" \
 --eval
Swin-L
export EDPOSE_CrowdPose_PATH=/path/to/your/crowdpose_dir
export pretrain_model_path=/path/to/your/swin_L_384_22k
  python main.py \
 --output_dir "logs/crowdpose_swinl" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epoch=80 lr_drop=75 num_body_points=14 backbone='swin_L_384_22k' \
 --dataset_file="crowdpose" \
 --pretrain_model_path "./models/edpose_swinl_crowdpose.pth" \
 --eval
Swin-L-5scale
export EDPOSE_CrowdPose_PATH=/path/to/your/crowdpose_dir
export pretrain_model_path=/path/to/your/swin_L_384_22k
  python -m torch.distributed.launch --nproc_per_node=4 main.py \
 --output_dir "logs/crowdpose_swinl" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epoch=80 lr_drop=75 num_body_points=14 backbone='swin_L_384_22k' \
 return_interm_indices=0,1,2,3 num_feature_levels=5 \
 -- dataset_file="crowdpose" \
 --pretrain_model_path "./models/edpose_swinl_5scale_crowdpose.pth" \
 --eval

Cite ED-Pose

@inproceedings{
yang2023explicit,
title={Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation},
author={Jie Yang and Ailing Zeng and Shilong Liu and Feng Li and Ruimao Zhang and Lei Zhang},
booktitle={International Conference on Learning Representations},
year={2023},
url={https://openreview.net/forum?id=s4WVupnJjmX}
}

About

The official repo for [ICLR'23] "Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation "


Languages

Language:Python 84.1%Language:Cuda 14.4%Language:C++ 1.4%Language:Shell 0.2%