- Linux or macOS
- Python 2 or 3
- NVIDIA GPU (12G or 24G memory) + CUDA cuDNN
- Install PyTorch and dependencies from http://pytorch.org
- Install python libraries dominate.
pip install dominate
- Download the latest G net from this link. In case you want a latest net G without the texture images check this link.
ATTENTION THESE MODELS ARE CURRENTLY TRAINED ON THE MVC DATASET AND NOT ON THE VIDEO POSES DATASET.
- Save the model into ./checkpoints/posedata/
- Download the test dataset:
- Save dataset to ./datasets/
- Test the model :
python test.py --name posedata --dataroot ./datasets/video_frames --label_nc 0 --no_instance --nThreads 1 --data_type 32 --loadSize 256 --multinput source dp_target dp_source texture --input_nc 12 --resize_or_crop resize_and_crop --phase val --how_many 500
The test results will be saved to a html file here: ./results/posedata/test_latest/index.html
.
For the full dataset of the moving models you can get through this link: The dataset is currently not preproccessed, so you would need to obtain the densepose estimations and the then do the texture warp
- Train a model at 256 resolution with MVC dataset:
python train.py --name posedata --dataroot ./datasets/MVC_pix2pix --label_nc 0 --no_instance --nThreads 1 --data_type 8 --loadSize 256 --multinput source dp_target dp_source texture --input_nc 12 --resize_or_crop resize_and_crop --batchSize 8
- If you want to train the code without texture you need to change it to this:
python train.py --name posedata --dataroot ./datasets/MVC_pix2pix --label_nc 0 --no_instance --nThreads 1 --data_type 8 --loadSize 256 --multinput source dp_target dp_source --input_nc 9 --resize_or_crop resize_and_crop --batchSize 8
- To view training results, please checkout intermediate results in
./checkpoints/posedata/web/index.html
. If you have tensorflow installed, you can see tensorboard logs in./checkpoints/posedata/logs
by adding--tf_log
to the training scripts.
- Train a model using multiple GPUs (
bash ./scripts/train_512p_multigpu.sh
):
#!./scripts/train_512p_multigpu.sh
python train.py --name label2city_512p --batchSize 8 --gpu_ids 0,1,2,3,4,5,6,7
Note: this is not tested and we trained our model using single GPU only. Please use at your own discretion.
- To train the images at full resolution (2048 x 1024) requires a GPU with 24G memory (
bash ./scripts/train_1024p_24G.sh
). If only GPUs with 12G memory are available, please use the 12G script (bash ./scripts/train_1024p_12G.sh
), which will crop the images during training. Performance is not guaranteed using this script.
- The default setting for preprocessing is
scale_width
, which will scale the width of all training images toopt.loadSize
(1024) while keeping the aspect ratio. If you want a different setting, please change it by using the--resize_or_crop
option. For example,scale_width_and_crop
first resizes the image to have widthopt.loadSize
and then does random cropping of size(opt.fineSize, opt.fineSize)
.crop
skips the resizing step and only performs random cropping. If you don't want any preprocessing, please specifynone
, which will do nothing other than making sure the image is divisible by 32.
This code is based on Pix2PixHD model (https://github.com/NVIDIA/pix2pixHD).