liaopeiyuan/generative-eval-harness

Minimal Generative Model Evaluation Harness

This is the evaluation harness extracted from the PyTorch implementation of StyleGAN2-ADA, extended to be applicable to generative models beyond StyleGAN-esque models.

Requirements

Linux and Windows are supported, but we recommend Linux for performance and compatibility reasons.
1–8 high-end NVIDIA GPUs with at least 2 GB of memory. We have done all testing and development using NVIDIA DGX-1 with 8 Tesla V100 GPUs.
64-bit Python 3.7 and PyTorch 1.7.1. See https://pytorch.org/ for PyTorch install instructions.
CUDA toolkit 11.0 or later. Use at least version 11.1 if running on RTX 3090. (Why is a separate CUDA toolkit installation required? See comments in #2.)
Python libraries: pip install click requests tqdm pyspng ninja imageio-ffmpeg==0.4.3. We use the Anaconda3 2020.11 distribution which installs most of these by default.
Docker users: use the provided Dockerfile to build an image with the required library dependencies.

The code relies heavily on custom PyTorch extensions that are compiled on the fly using NVCC. On Windows, the compilation requires Microsoft Visual Studio. We recommend installing Visual Studio Community Edition and adding it into PATH using "C:\Program Files (x86)\Microsoft Visual Studio\<VERSION>\Community\VC\Auxiliary\Build\vcvars64.bat".

Getting started

To evaluate your generative model, use an arbitrary method to form a folder of certain generated samples. Then, prepare the reference dataset and the test set by instructions to run dataset_tool.py explained below. Finally, run calc_metrics.py to evaluate common metrics:

python3 calc_metrics.py --metrics=kid50k_full --data=cifar10.zip --testdata={TEST_DATA}.zip

Docker: You can run the above curated image example using Docker as follows:

docker build --tag sg2ada:latest .
./docker_run.sh python3 calc_metrics.py --metrics=kid50k_full --data=./cifar10.zip --testdata=./cifar10.zip --resolution=32

Note: The Docker image requires NVIDIA driver release r455.23 or later.

Preparing datasets

Datasets are stored as uncompressed ZIP archives containing uncompressed PNG files and a metadata file dataset.json for labels.

Custom datasets can be created from a folder containing images; see python dataset_tool.py --help for more information. Alternatively, the folder can also be used directly as a dataset, without running it through dataset_tool.py first, but doing so may lead to suboptimal performance.

Legacy TFRecords datasets are not supported — see below for instructions on how to convert them.

FFHQ:

Step 1: Download the Flickr-Faces-HQ dataset as TFRecords.

Step 2: Extract images from TFRecords using dataset_tool.py from the TensorFlow version of StyleGAN2-ADA:

# Using dataset_tool.py from TensorFlow version at
# https://github.com/NVlabs/stylegan2-ada/
python ../stylegan2-ada/dataset_tool.py unpack \
    --tfrecord_dir=~/ffhq-dataset/tfrecords/ffhq --output_dir=/tmp/ffhq-unpacked

Step 3: Create ZIP archive using dataset_tool.py from this repository:

# Original 1024x1024 resolution.
python dataset_tool.py --source=/tmp/ffhq-unpacked --dest=~/datasets/ffhq.zip

# Scaled down 256x256 resolution.
python dataset_tool.py --source=/tmp/ffhq-unpacked --dest=~/datasets/ffhq256x256.zip \
    --width=256 --height=256

MetFaces: Download the MetFaces dataset and create ZIP archive:

python dataset_tool.py --source=~/downloads/metfaces/images --dest=~/datasets/metfaces.zip

AFHQ: Download the AFHQ dataset and create ZIP archive:

python dataset_tool.py --source=~/downloads/afhq/train/cat --dest=~/datasets/afhqcat.zip
python dataset_tool.py --source=~/downloads/afhq/train/dog --dest=~/datasets/afhqdog.zip
python dataset_tool.py --source=~/downloads/afhq/train/wild --dest=~/datasets/afhqwild.zip

CIFAR-10: Download the CIFAR-10 python version and convert to ZIP archive:

python dataset_tool.py --source=~/downloads/cifar-10-python.tar.gz --dest=~/datasets/cifar10.zip

LSUN: Download the desired categories from the LSUN project page and convert to ZIP archive:

python dataset_tool.py --source=~/downloads/lsun/raw/cat_lmdb --dest=~/datasets/lsuncat200k.zip \
    --transform=center-crop --width=256 --height=256 --max_images=200000

python dataset_tool.py --source=~/downloads/lsun/raw/car_lmdb --dest=~/datasets/lsuncar200k.zip \
    --transform=center-crop-wide --width=512 --height=384 --max_images=200000

BreCaHAD:

Step 1: Download the BreCaHAD dataset.

Step 2: Extract 512x512 resolution crops using dataset_tool.py from the TensorFlow version of StyleGAN2-ADA:

# Using dataset_tool.py from TensorFlow version at
# https://github.com/NVlabs/stylegan2-ada/
python dataset_tool.py extract_brecahad_crops --cropsize=512 \
    --output_dir=/tmp/brecahad-crops --brecahad_dir=~/downloads/brecahad/images

Step 3: Create ZIP archive using dataset_tool.py from this repository:

python dataset_tool.py --source=/tmp/brecahad-crops --dest=~/datasets/brecahad.zip

Metrics

We employ the following metrics in the ADA paper. Execution time and GPU memory usage is reported for one NVIDIA Tesla V100 GPU at 1024x1024 resolution:

Metric	Time	GPU mem	Description
`fid50k_full`	13 min	1.8 GB	Fréchet inception distance^[1] against the full dataset
`kid50k_full`	13 min	1.8 GB	Kernel inception distance^[2] against the full dataset
`pr50k3_full`	13 min	4.1 GB	Precision and recall^[3] againt the full dataset
`is50k`	13 min	1.8 GB	Inception score^[4] for CIFAR-10

In addition, the following metrics from the StyleGAN and StyleGAN2 papers are also supported:

Metric	Time	GPU mem	Description
`fid50k`	13 min	1.8 GB	Fréchet inception distance against 50k real images
`kid50k`	13 min	1.8 GB	Kernel inception distance against 50k real images
`pr50k3`	13 min	4.1 GB	Precision and recall against 50k real images
`ppl2_wend`	36 min	2.4 GB	Perceptual path length^[5] in W, endpoints, full image
`ppl_zfull`	36 min	2.4 GB	Perceptual path length in Z, full paths, cropped image
`ppl_wfull`	36 min	2.4 GB	Perceptual path length in W, full paths, cropped image
`ppl_zend`	36 min	2.4 GB	Perceptual path length in Z, endpoints, cropped image
`ppl_wend`	36 min	2.4 GB	Perceptual path length in W, endpoints, cropped image

References:

GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium, Heusel et al. 2017
Demystifying MMD GANs, Bińkowski et al. 2018
Improved Precision and Recall Metric for Assessing Generative Models, Kynkäänniemi et al. 2019
Improved Techniques for Training GANs, Salimans et al. 2016
A Style-Based Generator Architecture for Generative Adversarial Networks, Karras et al. 2018

License

This work is made available under the Nvidia Source Code License.

The modification is released under an MIT license.

Citation

You should probably cite the original authors of the ADA paper:

@inproceedings{Karras2020ada,
  title     = {Training Generative Adversarial Networks with Limited Data},
  author    = {Tero Karras and Miika Aittala and Janne Hellsten and Samuli Laine and Jaakko Lehtinen and Timo Aila},
  booktitle = {Proc. NeurIPS},
  year      = {2020}
}

Or, additionally, you may cite ArtBench, which uses this harness to evaluate results:

@misc{artbench,
  author = {Peiyuan Liao and Xiuyu Li and Xihui Liu and Kurt Keutzer},
  title  = {The ArtBench Dataset: Benchmarking Generative Models with Artworks},
  year   = {2022},
  url    = {https://github.com/liaopeiyuan/artbench}
}

liaopeiyuan / generative-eval-harness