WorldPose trainer

A Tensorflow V2 implementation of a simple baseline for 3d human pose estimation. Check the original implementation writen by Julieta Martinez et al.. Data processing and model architecture are mostly the same as the original version, thanks to the authors.

Requirements

Python ≥ 3.9, along with the packages:
- cdflib
- tensorflow 2.7.0 or later
Human3.6M dataset. Request access permisions in its website.

Prepare the dataset

Go to Human3.6M website, log in, and download the D3 Positions files for subjects [1, 5, 6, 7, 8, 9, 11], and put them under the folder data/h36m. Your directory structure should look like this:

src/
README.md
LICENCE
...
data/
  └── h36m/
    ├── Poses_D3_Positions_S1.tgz
    ├── Poses_D3_Positions_S11.tgz
    ├── Poses_D3_Positions_S5.tgz
    ├── Poses_D3_Positions_S6.tgz
    ├── Poses_D3_Positions_S7.tgz
    ├── Poses_D3_Positions_S8.tgz
    └── Poses_D3_Positions_S9.tgz

Now, move to the data folder, and uncompress all the data

cd data/h36m/
for file in *.tgz; do tar -xvzf $file; done

Finally, download the code-v1.2.zip file, unzip it, and copy the metadata.xml file under data/h36m/

Now, your data directory should look like this:

data/
  └── h36m/
    ├── metadata.xml
    ├── S1/
    ├── S11/
    ├── S5/
    ├── S6/
    ├── S7/
    ├── S8/
    └── S9/

There is one little fix we need to run for the data to have consistent names:

mv h36m/S1/MyPoseFeatures/D3_Positions/TakingPhoto.cdf \
   h36m/S1/MyPoseFeatures/D3_Positions/Photo.cdf

mv h36m/S1/MyPoseFeatures/D3_Positions/TakingPhoto\ 1.cdf \
   h36m/S1/MyPoseFeatures/D3_Positions/Photo\ 1.cdf

mv h36m/S1/MyPoseFeatures/D3_Positions/WalkingDog.cdf \
   h36m/S1/MyPoseFeatures/D3_Positions/WalkDog.cdf

mv h36m/S1/MyPoseFeatures/D3_Positions/WalkingDog\ 1.cdf \
   h36m/S1/MyPoseFeatures/D3_Positions/WalkDog\ 1.cdf

And you are done!

Usage

For creating a similar model as the original work, GT detections (MA), run:

python3 src/run.py --dropout 0.5 --residual --clip-linear-weights --batch-norm --eval-by-action

All available options are:

python3 run.py [-h] [--linear-size LINEAR_SIZE] [--num-bi-layers NUM_BI_LAYERS] [--dropout DROPOUT] [--residual] [--clip-linear-weights] [--batch-norm] [--dont-load] [--learning-rate LEARNING_RATE] [--epochs EPOCHS] [--eval-by-action] [--tflite | --tflite-int8] [--cameras-path CAMERAS_PATH] [--data-path DATA_PATH] [--train-path TRAIN_PATH]

Train WorldPose Tensorflow model

optional arguments:
  -h, --help            show this help message and exit

model creation arguments:
  --linear-size LINEAR_SIZE
                        Size of model layers. Defaults to 1024
  --num-bi-layers NUM_BI_LAYERS
                        Number of "bi-linear" blocks in the model. Defaults to 2
  --dropout DROPOUT     Dropout keep probability. 1 means no dropout. Defaults to 1.0
  --residual            Whether to add a residual connection every 2 bi-linear block
  --clip-linear-weights
                        Clip weights of Dense layers by norm 1
  --batch-norm          Use BatchNormalization
  --dont-load           Do not load model from checkpoint, if any

training arguments:
  --learning-rate LEARNING_RATE
                        Learning rate. Defaults to 0.001
  --epochs EPOCHS       How many epochs we should train for. Defaults to 200

testing arguments:
  --eval-by-action      Evaluate model by action instead of all test data at once

saving arguments:
  --tflite              Save trained model in TFLite format.
  --tflite-int8         Save trained model in TFLite format but quantized.

paths arguments:
  --cameras-path CAMERAS_PATH
                        File with h36m metadata, including cameras. Defaults to ./data/h36m/metadata.xml
  --data-path DATA_PATH
                        Data directory. Defaults to ./data/h36m
  --train-path TRAIN_PATH
                        Training directory. Defaults to ./training

References

@inproceedings{martinez_2017_3dbaseline,
  title={A simple yet effective baseline for 3d human pose estimation},
  author={Martinez, Julieta and Hossain, Rayat and Romero, Javier and Little, James J.},
  booktitle={ICCV},
  year={2017}
}

@article{h36m_pami,
  author = {Ionescu, Catalin and Papava, Dragos and Olaru, Vlad and Sminchisescu,  Cristian},
  title = {Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments},
  journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
  publisher = {IEEE Computer Society},
  volume = {36},
  number = {7},
  pages = {1325-1339},
  month = {jul},
  year = {2014}
}

@inproceedings{IonescuSminchisescu11,
  author = {Catalin Ionescu, Fuxin Li, Cristian Sminchisescu},
  title = {Latent Structured Models for Human Pose Estimation},
  booktitle = {International Conference on Computer Vision},
  year = {2011}
}

License

MIT

cristinabolanospeno / world-pose-trainer