Overcoming Barriers to Data Sharing with Medical Image Generation: A Comprehensive Evaluation

Code to reproduce the benchmark of synthetic medical imaging datasets for chest radiographs and brain CT scans, consisting of GAN training & inference, predicitive model training & testing, computing the nearest neighbours of selected images and finding the attribution maps for the trained classifier.

Project Page Link

arXiv Preprint

Abstract

Privacy concerns around sharing personally identifiable information are a major practical barrier to data sharing in medical research. However, in many cases, researchers have no interest in a particular individual's information but rather aim to derive insights at the level of cohorts. Here, we utilize Generative Adversarial Networks (GANs) to create derived medical imaging datasets consisting entirely of synthetic patient data. The synthetic images ideally have, in aggregate, similar statistical properties to those of a source dataset but do not contain sensitive personal information. We assess the quality of synthetic data generated by two GAN models for chest radiographs with 14 different radiology findings and brain computed tomography (CT) scans with six types of intracranial hemorrhages. We measure the synthetic image quality by the performance difference of predictive models trained on either the synthetic or the real dataset. We find that synthetic data performance disproportionately benefits from a reduced number of unique label combinations and determine at what number of samples per class overfitting effects start to dominate GAN training. Our open-source benchmark findings also indicate that synthetic data generation can benefit from higher levels of spatial resolution. We additionally conducted a reader study in which trained radiologists do not perform better than random on discriminating between synthetic and real medical images for both data modalities to a statistically significant extent. Our study offers valuable guidelines and outlines practical conditions under which insights derived from synthetic medical images are similar to those that would have been derived from real imaging data. Our results indicate that synthetic data sharing may be an attractive and privacy-preserving alternative to sharing real patient-level data in the right settings.

Teaser Figure: Generated images, nearest neighbours and some benchmark plots

Preparing datasets for training

Download the CheXpert-v1.0-small dataset and/or RSNA Intracranial Hemorrhage dataset and place them in your working directory.
Run the dataset_tool.py file in the build_dataset directory, for each dataset there are three different functions for each benchmark:
- Chest (Note: Here /path_to_raw_data is the path to the directory where CheXpert-v1.0-small is located)
  - Resolution: > python3 dataset_tool.py create_from_xray /target_directory/chest/resolution /path_to_raw_data
  - Classes: > python3 dataset_tool.py create_class_from_xray /target_directory/chest/classes /path_to_raw_data
  - Samples: > python3 dataset_tool.py create_samples_from_xray /target_directory/chest/samples /path_to_raw_data
- Brain (Note: Here /path_to_raw_data_dir is the path to the brain dataset directory where new_stage_2_train.csv is located)
  - Resolution: > python3 dataset_tool.py create_from_brain /target_directory/brain/resolution /path_to_raw_data_dir
  - Classes: > python3 dataset_tool.py create_class_from_brain /target_directory/brain/classes /path_to_raw_data_dir
  - Samples: > python3 dataset_tool.py create_samples_from_brain /target_directory/brain/samples /path_to_raw_data_dir

GAN training and inference

All defaults in the config.py files correspond to the paper hyper-parameter defaults at 32x32 resolution (classes and samples benchmark) and 1 GPU, where GAN training is stopped by FID convergence after a minimum number of real images shown to the discriminator. Here, we only explain a few training parameters, see the official prog-GAN for help on other parameters.

Edit the config.py files in either of the GAN directories, particularly the Config presets section:
- num_gpus: How many GPUs to train on
- sched.lod_initial_resolution: Initial resolution, set to 8 for the prog-GAN and to final resolution for the cpd-GAN (no growth)
- train.total_kimg: Total number of real images (thousand) regardless of FID convergence
- train.compute_fid_score: Whether to stop training by FID convergence, train.minimum_fid_kimg: Minimum number of real images
- train.fid_snapshot_ticks: How often to compute FID, train.fid_patience=2: Patience of FID earlystopping
Run the train.py file in the GAN_cpd or GAN_prog directory in the form: > python3 train.py train data_dir results_dir random_seed resolution.
- All outputs will be saved in results_dir (image-snapshots, network-snapshots, tf.event files).
  - Example (Chest classes=10): > python3 train.py train /work/chest_dataset/classes/10 /work/chest_benchmark/classes/10 1000 32
After training run the test.py file in the GAN_cpd or GAN_prog directory in the form: > python3 test.py test data_dir results_dir random_seed resolution.
- For inference the network weights in network-final.pkl from results_dir will be loaded.
- The results (tfrecord files) will be saved under results_dir/inference.
  - Example (Chest samples=500): > python3 test.py test /work/chest_dataset/samples/500 /work/chest_benchmark/samples/500 1000 32

Fake and real classifier training and testing

Download densenet121_weights_tf_dim_ordering_tf_kernels_notop.h5 (densenet-121 pretrained weights).
Edit the config.pyfiles in the classifier_fake and classifier_real directory, especially the path_to_densenet_pretrained_weights, (other training parameters should be the same as for both).
Fake
1. Run the train.py file in classifier_fake directory in the form: > python3 train.py train model_dir results_dir random_seed resolution.
  - The results will be saved under results_dir/classification_results/train.
    - Example (Brain resolution=128): > python3 train.py train /work/classifier_fake /work/brain_benchmark/resolution/128 1000 128
2. After training run the test.py file in the classifier_fake directory in the form: > python3 test.py test model_dir data_dir results_dir random_seed resolution.
  - The results will be saved under results_dir/classification_results/test.
    - Example (Brain resolution=128): > python3 test.py test /work/classifier_fake /work/brain_dataset/resolution/128 /work/brain_benchmark/resolution/128 1000 128
Real
- Here, results_dir=data_dir (results will be saved where dataset is stored).
1. Run the train.py file in classifier_real directory in the form: > python3 train.py train model_dir results_dir(=data_dir) random_seed resolution.
  - The results will be saved under results_dir(=data_dir)/classification_results/train.
    - Example (Chest classes=20): > python3 train.py train /work/classifier_real /work/chest_dataset/classes/20 1000 32
2. After training run the test.py file in the classifier_real directory in the form: > python3 test.py test model_dir results_dir(=data_dir) random_seed resolution.
  - The results will be saved under results_dir(=data_dir)/classification_results/test.
    - Example (Chest classes=20): > python3 test.py test /work/classifier_real /work/chest_dataset/classes/20 1000 32

Nearest neighbours

In the classification_results directory of the GAN model for which you want to find the nearest neighbours of synthetics, create the directory nn
Copy the best_weights.h5 and weights.h5 from the real predictive model that you want to use (found in classification_results/train) into nn
Depending on for how many synthetic images you want to determine the nearest neighbours, edit number_nn_compute in the config.py file in the classifier_fake directory.
Run the nn.py file in classifier_fake directory in the form: > python3 nn.py nn model_dir data_dir results_dir resolution.
- The results will be saved in the nn directory in form of .npy files
- Example (Brain samples=500): > python3 nn.py nn /work/classifier_fake /work/brain_dataset/samples/500 /work/brain_benchmark/samples/500 32

Fake attribution maps

In the classification_results/test directory of the predictive model trained on GAN synthetics for which you want to find the attribution maps, create the directory nn_files
Place the images that you want to analyse into nn_files, including nn_path_and_labels.csv that conists of the headers "Path,Label1,Label2,.." where Path is simply the name of each corresponding image (e.g. "nn_123.png")
Run the cxpl.py file in classifier_fake directory in the form: > python3 cxpl.py cxpl model_dir results_dir.
- The results will be saved in the nn_files directory in form of .npy files
- Example (Chest resolution=32): > python3 nn.py nn /work/classifier_fake /work/chest_benchmark/resolution/32

Real attribution maps

In the classification_results/test directory within your dataset folder, of the predictive model trained on reals for which you want to find the attribution maps, create the directory nn_files
Place the images that you want to analyse into nn_files, including nn_path_and_labels.csv that conists of the headers "Path,Label1,Label2,.." where Path is simply the name of each corresponding image (e.g. "nn_123.png")
Run the cxpl.py file in classifier_real directory in the form: > python3 cxpl.py cxpl model_dir results_dir(=data_dir) resolution.
- Here, results_dir=data_dir (results will be saved where dataset is stored).
- The results will be saved in the nn_files directory in form of .npy files
- Example (Brain resolution=64): > python3 nn.py nn /work/classifier_fake /work/brain_dataset/resolution/64 64

License

MIT License

mrymaltin / synthetic-medical-benchmark