ED-Pose

This is the official pytorch implementation of our ICLR 2023 paper "Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation ".

Visualization

All videos are using per-frame estimation without temporal smoothing.

Case1: For single-person dancer pose estimation:

Case2: For self-occlusion scene:

Case3: For occlusion scenes with extended keypoint prediction beyond the image thanks to the regression loss:

Case4: For fast-moving scenes:

Introduction

We present ED-Pose, an end-to-end framework with Explicit box Detection for multi-person Pose estimation. ED-Pose re-considers this task as two explicit box detection processes with a unified representation and regression supervision. In general, ED-Pose is conceptually simple without post-processing and dense heatmap supervision.

For the first time, ED-Pose, as a fully end-to-end framework with a L1 regression loss, surpasses heatmap-based Top-down methods under the same backbone by 1.2 AP on COCO.
ED-Pose achieves the state-of-the-art with 76.6 AP on CrowdPose without test-time augmentation.

Methods

Todo

This repo contains further modifications including:

Integrated into detrex.
Integrated into Huggingface Spaces 🤗 using Gradio.

Model Zoo

We have put our model checkpoints here.

Results on COCO val2017 dataset

Model	Backbone	Lr schd	mAP	AP⁵⁰	AP⁷⁵	AP^M	AP^L	Time (ms)	Download
ED-Pose	R-50	60e	71.7	89.7	78.8	66.2	79.7	51	Google Drive
ED-Pose	Swin-L	60e	74.3	91.5	81.7	68.5	82.7	88	Google Drive
ED-Pose	Swin-L-5scale	60e	75.8	92.3	82.9	70.4	83.5	142	Google Drive

Results on CrowdPose test dataset

Model	Backbone	Lr schd	mAP	AP⁵⁰	AP⁷⁵	AP^E	AP^M	AP^H	Download
ED-Pose	R-50	80e	69.9	88.6	75.8	77.7	70.6	60.9	Google Drive
ED-Pose	Swin-L	80e	73.1	90.5	79.8	80.5	73.8	63.8	Google Drive
ED-Pose	Swin-L-5scale	80e	76.6	92.4	83.3	83.0	77.3	68.3	Google Drive

Results on COCO test-dev dataset

Model	Backbone	Loss	mAP	AP⁵⁰	AP⁷⁵	AP^M	AP^L
DirectPose	R-50	Reg	62.2	86.4	68.2	56.7	69.8
DirectPose	R-101	Reg	63.3	86.7	69.4	57.8	71.2
FCPose	R-50	Reg+HM	64.3	87.3	71.0	61.6	70.5
FCPose	R-101	Reg+HM	65.6	87.9	72.6	62.1	72.3
InsPose	R-50	Reg+HM	65.4	88.9	71.7	60.2	72.7
InsPose	R-101	Reg+HM	66.3	89.2	73.0	61.2	73.9
PETR	R-50	Reg+HM	67.6	89.8	75.3	61.6	76.0
PETR	Swin-L	Reg+HM	70.5	91.5	78.7	65.2	78.0
ED-Pose	R-50	Reg	69.8	90.2	77.2	64.3	77.4
ED-Pose	Swin-L	Reg	72.7	92.3	80.9	67.6	80.0

Note:

Any test-time augmentations is not used for ED-Pose.
We use the Object365 dataset to pretrain the human detection of ED-Pose under the Swin-L-5scale setting.

Environment Setup

Installation

We use the DN-Deformable-DETR as our codebase. We test our models under python=3.7.3,pytorch=1.9.0,cuda=11.1. Other versions might be available as well.

Clone this repo

git clone https://github.com/IDEA-Research/ED-Pose.git
cd ED-Pose

Install Pytorch and torchvision

Follow the instruction on https://pytorch.org/get-started/locally/.

# an example:
conda install -c pytorch pytorch torchvision

Install other needed packages

pip install -r requirements.txt

Compiling CUDA operators

cd models/edpose/ops
python setup.py build install
# unit test (should see all checking is True)
python test.py
cd ../../..

Data Preparation

For COCO data, please download from COCO download. The coco_dir should look like this:

|-- EDPose
`-- |-- coco_dir
    `-- |-- annotations
        |   |-- person_keypoints_train2017.json
        |   `-- person_keypoints_val2017.json
        `-- images
            |-- train2017
            |   |-- 000000000009.jpg
            |   |-- 000000000025.jpg
            |   |-- 000000000030.jpg
            |   |-- ... 
            `-- val2017
                |-- 000000000139.jpg
                |-- 000000000285.jpg
                |-- 000000000632.jpg
                |-- ...

For CrowdPose data, please download from CrowdPose download, The crowdpose_dir should look like this:

|-- ED-Pose
`-- |-- crowdpose_dir
    `-- |-- json
        |   |-- crowdpose_train.json
        |   |-- crowdpose_val.json
        |   |-- crowdpose_trainval.json (generated by util/crowdpose_concat_train_val.py)
        |   `-- crowdpose_test.json
        `-- images
            |-- 100000.jpg
            |-- 100001.jpg
            |-- 100002.jpg
            |-- 100003.jpg
            |-- 100004.jpg
            |-- 100005.jpg
            |-- ...

Run

Training on COCO:

Single GPU

#For ResNet-50:
export EDPOSE_COCO_PATH=/path/to/your/cocodir
  python main.py \
 --output_dir "logs/coco_r50" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epoch=60 lr_drop=55 num_body_points=17 backbone='resnet50' \
 --dataset_file="coco"

#For Swin-L:
export EDPOSE_COCO_PATH=/path/to/your/cocodir
export pretrain_model_path=/path/to/your/swin_L_384_22k
  python main.py \
 --output_dir "logs/coco_swinl" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epoch=60 lr_drop=55 num_body_points=17 backbone='swin_L_384_22k' \
 --dataset_file="coco"

Distributed Run

#For ResNet-50:
export EDPOSE_COCO_PATH=/path/to/your/cocodir
  python -m torch.distributed.launch --nproc_per_node=4  main.py \
 --output_dir "logs/coco_r50" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epoch=60 lr_drop=55 num_body_points=17 backbone='resnet50' \
 --dataset_file="coco"

#For Swin-L:
export EDPOSE_COCO_PATH=/path/to/your/cocodir
export pretrain_model_path=/path/to/your/swin_L_384_22k
  python -m torch.distributed.launch --nproc_per_node=4 main.py \
 --output_dir "logs/coco_swinl" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epoch=60 lr_drop=55 num_body_points=17 backbone='swin_L_384_22k' \
 --dataset_file="coco"

Training on CrowdPose:

Single GPU

#For ResNet-50:
export EDPOSE_CrowdPose_PATH=/path/to/your/crowdpose_dir
  python main.py \
 --output_dir "logs/crowdpose_r50" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epoch=80 lr_drop=75 num_body_points=14 backbone='resnet50' \
 --dataset_file="crowdpose"

#For Swin-L:
export EDPOSE_CrowdPose_PATH=/path/to/your/crowdpose_dir
export pretrain_model_path=/path/to/your/swin_L_384_22k
  python main.py \
 --output_dir "logs/crowdpose_swinl" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epoch=80 lr_drop=75 num_body_points=14 backbone='swin_L_384_22k' \
 --dataset_file="crowdpose"

Distributed Run

#For ResNet-50:
export EDPOSE_CrowdPose_PATH=/path/to/your/crowdpose_dir
  python -m torch.distributed.launch --nproc_per_node=4  main.py \
 --output_dir "logs/crowdpose_r50" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epoch=80 lr_drop=75 num_body_points=14 backbone='resnet50' \
 --dataset_file="crowdpose"

#For Swin-L:
export EDPOSE_CrowdPose_PATH=/path/to/your/crowdpose_dir
export pretrain_model_path=/path/to/your/swin_L_384_22k
  python -m torch.distributed.launch --nproc_per_node=4 main.py \
 --output_dir "logs/crowdpose_swinl" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epoch=80 lr_drop=75 num_body_points=14 backbone='swin_L_384_22k' \
 --dataset_file="crowdpose"

We have put the Swin-L model pretrained on ImageNet-22k here.

Evaluation on COCO:

ResNet-50

export EDPOSE_COCO_PATH=/path/to/your/cocodir
  python -m torch.distributed.launch --nproc_per_node=4  main.py \
 --output_dir "logs/coco_r50" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epoch=60 lr_drop=55 num_body_points=17 backbone='resnet50' \
 --dataset_file="coco" \
 --pretrain_model_path "./models/edpose_r50_coco.pth" \
 --eval

Swin-L

export EDPOSE_COCO_PATH=/path/to/your/cocodir
export pretrain_model_path=/path/to/your/swin_L_384_22k
  python -m torch.distributed.launch --nproc_per_node=4 main.py \
 --output_dir "logs/coco_swinl" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epoch=60 lr_drop=55 num_body_points=17 backbone='swin_L_384_22k' \
 --dataset_file="coco" \
 --pretrain_model_path "./models/edpose_swinl_coco.pth" \
 --eval

Swin-L-5scale

export EDPOSE_COCO_PATH=/path/to/your/cocodir
export pretrain_model_path=/path/to/your/swin_L_384_22k
  python -m torch.distributed.launch --nproc_per_node=4 main.py \
 --output_dir "logs/coco_swinl" \
 -c config/edpose.cfg.py \
 --options batch_size=4 epoch=60 lr_drop=55 num_body_points=17 backbone='swin_L_384_22k' \
  return_interm_indices=0,1,2,3 num_feature_levels=5 \
 --dataset_file="coco" \
 --pretrain_model_path "./models/edpose_swinl_5scale_coco.pth" \
 --eval

Evaluation on CrowdPose: