This repository houses the official PyTorch implementation of our paper titled "PRIMIS: Privacy-Preserving Medical Image Sharing via Deep Sparsifying Transform Learning with Obfuscation", Journal of Biomedical Informatics, Elsevier, 2024.
Our aim is to design a privacy-preserving data-sharing mechanism that allows medical images to be stored as encoded and obfuscated representations in the public domain without revealing any useful or recoverable content from the images. In tandem, we aim to provide authorized users with compact private keys that could be used to reconstruct the corresponding images.
Our approach involves utilizing a neural auto-encoder. The convolutional filter outputs are passed through sparsifying transformations to produce multiple compact codes. Each code is responsible for reconstructing different attributes of the image. The key privacy-preserving element in this process is obfuscation through the use of specific pseudo-random noise. When applied to the codes, it becomes computationally infeasible for an attacker to guess the correct representation for all codes, thereby preserving the privacy of the data.
- Sparse Ternary AutoEncoder: A novel architecture for compressing images, allowing for adjustable reconstruction quality based on network capacity. Produces sparse ternary codes suited for ambiguation processes.
- Dataset Utility: A dedicated
torch.utils.data.Dataset
class for streamlined data operations. - Training Pipeline: A structured approach to effectively train the model.
- Inference Pipeline: Tools to ambiguate or disambiguate image representations.
- Set up a pip environment and install the necessary requirements listed under
setup.py
using:pip install -e .
- Navigate to
primis.data.py
and define theROOT_DIR
pointing to your image directory. - Inside
<ROOT_DIR>
, create asplits
directory containing anall.txt
file. This file should enlist the relative paths of all images with reference toROOT_DIR
. - Execute the following to determine train, validation, and test splits:
python3 primis/split.py
- Training: Define all training-related configurations in
experiments.compression.config_train.json
. This encompasses network parameters, data paths, and other essential configurations. - Inference: For inference, use
config_infer.py
. Ensure to specify thetrain_run
, which correlates with the TensorBoard run-tag located atprimis/experiments/compression/runs
. This path leads to the trained model.
Once configurations are in place, initiate training with:
python3 experiments/compression/train.py -c <path/to/the/train/config.json>
Progress can be monitored via TensorBoard.
For ambiguation and disambiguation tasks, execute:
python3 experiments/compression/infer.py <path/to/the/inference/config.json>
If you find our work useful in your research, please consider citing our paper. The bibtex entry is provided below:
@article{shiri2023primis,
title={PRIMIS: Privacy-Preserving Medical Image Sharing via Deep Sparsifying Transform Learning with Obfuscation},
author={Shiri, Isaac and Razeghi, Behrooz and Ferdowsi, Sohrab and Salimi, Yazdan and G{\"u}nd{\"u}z, Deniz and Teodoro, Douglas and oloshynovskiy, Slava and Zaidi, Habib},
journal={Journal of Biomedical Informatics},
volume={150},
year={2024},
publisher={Elsevier}
}