Enhancement Strategies For Copy-Paste Generation & Localization in RGB Satellite Imagery

¹ Edoardo Daniele Cannas, ² Sriram Baireddy, ¹ Paolo Bestagini

¹ Stefano Tubaro, ² Edward J. Delp

¹ Image and Sound Processing Laboratory, ² Video and Image Processing Laboratory

This is the official code repository for the paper Enhancement Strategies For Copy-Paste Generation & Localization in RGB Satellite Imagery, accepted to the 2023 IEEE International Workshop on Information Forensics and Security (WIFS).
The repository is currently under development, so feel free to open an issue if you encounter any problem.

Landsat8 sample, no equalization.	Landsat8 sample, uniform equalization.
Sentinel2A sample, no equalization.	Sentinel2A sample, uniform equalization.

Getting started

Prerequisites

In order to run our code, you need to:

install conda
create the overhead-norm-strategies environment using the environment.yml file

conda env create -f envinroment.yml
conda activate overhead-norm-strategies

Data

You can download the dataset from this link.
The dataset is composed of 2 folders:

pristine_images: contains the raw full resolution products (pristine_images/full_res_products) and the 256x256 patches extracted from them (pristine_images/patches);
spliced_images: contains the copy-paste images generated from the pristine_images/patches/test_patches using the isplutils/create_spliced_rgb_samples.py script.

In order to train the model, you first have to divide the dataset into training, validation and test splits.
You can do this by running the notebook/Training dataset creation.ipynb notebook. Please notice that these splits and patches are the ones used in the paper, but you can create your own by modifying the notebook.

If you want to inspect the raw products, a starting point is the Raw satellite products processing notebook.

The whole pipeline

Normalization strategies

All the normalization strategies used in the paper are provided as classes in the isplutils/data.py file.
Please notice that for the MinPMax strategy, we used the RobustScaler implementation from sklearn.
Statistics are learned from the training set, and then applied to the validation and test sets.
We provide the scalers used in the paper, one for each satellite product, inside the folders of pristine_images/full_res_products.

Model training

The train_fe.py takes care of training the models.
You can find the network definition in the isplutils/network.py file.
All the hyperparameters for training are listed in the file.
To replicate the models used in the paper, follow the train_all.sh bash script.

Model evaluation

Inside the data/spliced_images folder are contained the two datasets used in the paper, i.e.:

Standard Generated Dataset (SGD): images generated by simply normalizing the dynamics between 0 and 1 using a maximum scaling;
Histogram Equalized Generated Dataset (HEGD): images generated by equalizing the histogram of the images using a uniform distribution.

Inside each folder, there is a Pandas DataFrame containing info on the images.
Inside the models folder, we provide the models presented in the paper (both weights and definitions).
You can replicate our results using the test_with_AUCs.py script. In alternative, you can run the bash script test_all.sh.
Once you have the results, use the notebooks/Mean test results plot.ipynb notebook to plot the results shown in the paper.

polimi-ispl / overhead_norm_strategies