Image-Play

Code for reproducing the results in the following paper:

This repo, together with skeleton2d3d and pose-hg-train (branch image-play), hold the code for reproducing the results in the following paper:

Forecasting Human Dynamics from Static Images
Yu-Wei Chao, Jimei Yang, Brian Price, Scott Cohen, Jia Deng
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

Check out the project site for more details.

Role

The main content of this repo is for implementing training step 3 (Sec. 3.3), i.e. training the full 3D-PFNet (hourglass + RNNs + 3D skeleton converter).
For the implementation of training step 1, please refer to submodule pose-hg-train (branch image-play).
For the implementation of training step 2, please refer to submodule skeleton2d3d.

Citing Image-Play

Please cite Image-Play if it helps your research:

@INPROCEEDINGS{chao:cvpr2017,
  author = {Yu-Wei Chao and Jimei Yang and Brian Price and Scott Cohen and Jia Deng},
  booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  title = {Forecasting Human Dynamics from Static Images},
  year = {2017},
}

Clone the Repository

This repo contains three submodules (pose-hg-train, skeleton2d3d, and Deep3DPose), so make sure you clone with --recursive:

git clone --recursive https://github.com/ywchao/image-play.git

Download Pre-Computed Models and Prediction
Dependencies
Setting Up Penn Action
Training to Forecast 2D Pose
Training to Forecast 3D Pose
Comparison with NN Baselines
Evaluation
Human Character Rendering

Download Pre-Computed Models and Prediction

If you just want to run prediction or evaluation, you can simply download the pre-computed models and prediction (2.4G) and skip the training sections.

./scripts/fetch_implay_models_prediction.sh
./scripts/setup_symlinks_models.sh

This will populate the exp folder with precomputed_implay_models_prediction and set up a set of symlinks.

You can now set up Penn Action and run the evaluation demo with the downloaded prediction. This will ensure exact reproduction of the paper's results.

Dependencies

To proceed to the remaining content, make sure the following are installed.

Torch7
- We used commit bd5e664 (2016-10-17) with CUDA 8.0.27 RC and cuDNN v5.1 (cudnn-8.0-linux-x64-v5.1).
- All our models were trained on a GeForce GTX TITAN X GPU.
matio-ffi
torch-hdf5
MATLAB
Blender
- This is only required for human character rendering.
- We used release blender-2.78a-linux-glibc211-x86_64 (2016-10-26).

Setting Up Penn Action

The Penn Action dataset is used for training and evaluation.

Download the Penn Action dataset to external. external should contain Penn_Action.tar.gz. Extract the files:
```
tar zxvf external/Penn_Action.tar.gz -C external
```
This will populate the external folder with a folder Penn_Action with frames, labels, tools, and README.
Preprocess Penn Action by cropping the images:
```
matlab -r "prepare_penn_crop; quit"
```
This will populate the data/penn-crop folder with frames and labels.
Generate validation set and preprocess annotations:
```
matlab -r "generate_valid_penn; quit"
python tools/preprocess.py
```
This will populate the data/penn-crop folder with valid_ind.txt, train.h5, val.h5, and test.h5.
Optional: Visualize statistics:
```
matlab -r "vis_data_stats; quit"
```
The output will be saved in output/vis_dataset.
Optional: Visualize annotations:
```
matlab -r "vis_data_anno; quit"
```
The output will be saved in output/vis_dataset.
Optional: Visualize frame skipping. As mentioned in the paper (Sec 4.1), we generated training and evaluation sequences by skipping frames. The following MATLAB script visualizes a subset of the generated sequences after frame skipping:
```
matlab -r "vis_action_phase; quit"
```
The output will be saved in output/vis_action_phase.

Training to Forecast 2D Pose

We begin with training a minimal model (hourglass + RNNs) which does just 2D pose forecasting.

Before starting, make sure to remove the symlinks from the download section, if any:
```
find exp -type l -delete
```
Obtain a trained hourglass model. This is done with the submodule pose-hg-train.

Option 1: Download pre-computed hourglass models (50M): (recommended)
```
cd pose-hg-train
./scripts/fetch_hg_models.sh
./scripts/setup_symlinks_models.sh
cd ..
```
This will populate the pose-hg-train/exp folder with precomputed_hg_models and set up a set of symlinks.

Option 2: Train your own models.
Start training:
```
./scripts/penn-crop/hg-256-res-clstm.sh $GPU_ID
```
The output will be saved in exp/penn-crop/hg-256-res-clstm.
Optional: Visualize training loss and accuracy:
```
matlab -r "exp_name = 'hg-256-res-clstm'; plot_loss_err_acc; quit"
```
The output will be saved to output/plot_hg-256-res-clstm.pdf.
Optional: Visualize prediction on a subset of the test set:
```
matlab -r "vis_preds_2d; quit"
```
The output will be saved in output/vis_hg-256-res-clstm.

Training to Forecast 3D Pose

Now we train the full 3D-PFNet (hourglass + RNNs + 3D skeleton converter), which also converts each 2D pose into 3D.

Obtain a trained hourglass model if you have not (see the section above).
Obtain a trained 3d skeleton converter. This is done with the submodule skeleton2d3d.

Option 1: Download pre-computed s2d3d models (108M): (recommended)
```
cd skeleton2d3d
./scripts/fetch_s2d3d_models_prediction.sh
./scripts/setup_symlinks_models.sh
cd ..
```
This will populate the skeleton2d3d/exp folder with precomputed_s2d3d_models_prediction and set up a set of symlinks.

Option 2: Train your own models (on ground-truth heatmaps).
Start training:
```
./scripts/penn-crop/hg-256-res-clstm-res-64.sh $GPU_ID
```
The output will be saved in exp/penn-crop/hg-256-res-clstm-res-64.
Optional: Visualize training loss and accuracy:
```
matlab -r "exp_name = 'hg-256-res-clstm-res-64'; plot_loss_err_acc; quit"
```
The output will be saved to output/plot_hg-256-res-clstm-res-64.pdf.
Optional: Visualize prediction on a subset of the test set. Here we leverage Human3.6M's 3D pose visualizing routine.

First, download the Human3.6M dataset code:
```
cd skeleton2d3d
./h36m_utils/fetch_h36m_code.sh
cd ..
```
This will populate the skeleton2d3d/h36m_utils folder with Release-v1.1.

Then run the visualization script:
```
matlab -r "vis_preds_3d; quit"
```
If you run this for the first time, the script will ask you to set two paths. Set the data path to skeleton2d3d/external/Human3.6M and the config file directory to skeleton2d3d/h36m_utils/Release-v1.1. This will create a new file H36M.conf under image-play.

The output will be saved in output/vis_hg-256-res-clstm-res-64.

Comparison with NN Baselines

This demo reproduces the nearest neighbor (NN) baselines reported in the paper (Sec. 4.1).

Obtain a trained hourglass model if you have not (see the section above).
Run pose estimation on input images.
```
./scripts/penn-crop/hg-256.sh $GPU_ID
```
The output will be saved in exp/penn-crop/hg-256.
Run the NN baselines:
```
matlab -r "nn_run; quit"
```
The output will be saved in exp/penn-crop/nn-all-th09 and exp/penn-crop/nn-oracle-th09.
Optional: Visualize prediction on a subset of the test set:
```
matlab -r "nn_vis; quit"
```
The output will be saved in output/vis_nn-all-th09 and output/vis_nn-oracle-th09.

Evaluation

This demo runs the MATLAB evaluation script and reproduces our results in the paper (Tab. 1 and Fig. 7). If you are using pre-computed prediction, and want to also evaluate the NN baselines, make sure to first run step 3 in the last section.

Compute Percentage of Correct Keypoints (PCK):

matlab -r "eval_run; quit"

This will print out the PCK values with threshold 0.05 (PCK@0.05) and also show the PCK curves.

Human Character Rendering

Finally, we show how we rendered human characters from the forecasted 3D skeletal poses using the method developed by Chen et al. [4]. This relies on the submodule Deep3DPose.

Obtain forecasted 3D poses by either downloading pre-computed prediction or generating your own.

Set Blender path. Edit the following line in tools/render_scape.m:

blender_path = '$BLENDER_PATH/blender-2.78a-linux-glibc211-x86_64/blender';

Run rendering. We provide demo for both rendering without and with textures.

Render body shape without textures:
```
matlab -r "texture = 0; render_scape; quit"
```
Render body shape with textures:
```
matlab -r "texture = 1; render_scape; quit"
```
The output will be saved in output/render_hg-256-res-clstm-res-64.

zhaoyang10 / image-play