forged-images dann casia coco-dataset keras classification classifying-forged forgery-classification unsupervised-domain-adaptation unsupervised-learning coco-image-dataset feature-extraction ddc comofod

Copy-Move-Forgery-Classification-via-Unsupervised-Domain-Adaptation

This repository provides the official Python implementation of Syn2Real: Forgery Classification via Unsupervised Domain Adaptation. (Link) In this work, using Domain Adversarial Neural Network (DANN) and Deep Domain Confusion (DDC) Domain Adaptation networks, we adapt to the features from a synthetically generated dataset onto a realistic dataset. Our main focus is generalizability across forgery detection in unsupervised conditions, keeping in view to improve the accuracy scores too.

The repository includes:

Generating Copy-Move forgery snthetic data
Training dataset preparation
Training and testing code for DANN
Training and testing code for DDC
Base network models for feature extraction

The code is documented and designed to be easy to extend. If you use it in your research, please consider citing this paper (bibtex below).

In domain adaptation, we adapt the target domain feature space to source domain feature space, such that the features remains discriminative amongst classes but the domains becomes invariant. In our case, source domain is COCO forged dataset and target domain is CASIA V2 and CoMoFoD dataset. Our sorce domain contains of 40,000 images, half of which are authentic and the other half is forged. In target domain, CASIA contains 1300 authentic and 3300 copy-move forged images and CoMoFoD has 200 authentic and 200 forged.

We can't apply direct transfer learning in this case. Mainly, because of two reasons:

The number of images are less and the number of parameters needed are huge. It simply overfits the dataset and the test time perfromance is very poor.
Pre-trained archirectures are trained on ImageNet dataset in which there is provision for forged images as such. So, it may not perform well in our case.

Dependencies

Tested on Python 3.6.x and Keras 2.3.0 with TF backend version 1.14.0.

Numpy (1.16.4)
OpenCV (4.1.0)
Pandas (0.25.3)
Scikit-learn (0.22.1)
PyTorch (1.2.0)

Getting Started

Install the required dependencies:

pip install -r requirements.txt

dataset_generation.py Generates Copy-Move forged dataset utilizing COCO dataset.
data_prepare.py - Generate numpy arrays of training and testing datasets.
dann.py - Train the DANN model using AlexNet or VGG-7 as the base feature extraction architecture.
ddc.py - Train and test the DDC model using AlexNet and VGG-7 feature extractors.
models.py - AlexNet and VGG-7 base architecture models.

Step by Step Domain Adaptation

1. Dataset Generation

Semantic Inpainting: We used 80 sub-categories of COCO dataset to create a forged dataset. We take mask of each category and cut them out. Then, we fill those region via Deep Semantic Inpainting. In this way, the image looks natural as well as it make the network focus on edge discrepancies around the forged region. The figure below presents an overview of semantic inpainting dataset generation approach:

To generate inpainted images, have a look in this repository:-> Edge-Connect

Copy-Move Forgery: Images alongwith their segmentation is mask selected. We compare the mask of all the areas. Keeping a minimum threshold, we select the mask with the largest area. We apply a image matting so that pasted region could easily blend in. Overnight 60,000 images can be generated.

For CMF data generation, please look into my other repository:-> Synthetic data Generation

2. Domain Adaptation

DANN: It has two separate heads: Source classifier Head and Domain classifier head.

Source Head: Feature parameters(ϴ_f) and label classifier parameter optimized to reduce classification loss.
Domain Head: Feature parameters maximizes domain loss to make distributions similar.

DANN Loss function:

DDC: Minimizes the distance between source and target distribution via Maximum Mean Discrepancy (MMD) loss.

DDC Loss Function:

Dataset Information

Image Properties	CASIA	CoMoFoD
Resolution	240x160 - 900x600	512x512
# pristeine/ tampered	1701/3274	200/200
Image Format	JPG, TIFF	PNG
Post-image processing	Translation, Rotation, Scaling, Affine Transformation	Translation, Rotation, Scaling, Affine Transformation

Experiments

COCO->CASIA:

Train/Test Distribution: 4000/1000 images
More images -> Able to optimize huge number of parameters
DANN+VGG-7 outperforms others

COCO->CoMoFoD:

Train/Test Distribution: 200/200 images
Less images -> Can't optimize huge number of parameters
DDC+AlexNet outperforms others

Results

Compare to BusterNet where they have used 1 lakh images for supervised training, we used 40k images to achieve better accuracy.
Our approach using Domain Adaptation improves the previously reported baseline.

References

[1] Ganin, Yaroslav et al. “Domain-Adversarial Training of Neural Networks.” J. Mach. Learn. Res. 17 (2015). Link
[2] Tzeng, Eric et al. “Deep Domain Confusion: Maximizing for Domain Invariance.” ArXiv abs/1412.3474 (2014). Link
[3] Nazeri, Kamyar et al. “EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning.” ArXiv abs/1901.00212 (2019). Link

Citation

If you use this repository, please use this bibtex to cite the paper:

@InProceedings{Kumar_2020_WACV,
author = {Kumar, Akash and Bhavsar, Arnav and Verma, Rajesh},
title = {Syn2Real: Forgery Classification via Unsupervised Domain Adaptation},
booktitle = {The IEEE Winter Conference on Applications of Computer Vision (WACV) Workshops},
month = {March},
year = {2020}
}

Future Work

https://github.com/wuhuikai/GP-GAN -> Image Blending using GANs in high resolution images.
Improve the precision score keeping the high recall.

About

Classifying Forged vs Authentic using Domain Adaptation across in new domains in unsupervised settings

forged-images dann casia coco-dataset keras classification classifying-forged forgery-classification unsupervised-domain-adaptation unsupervised-learning coco-image-dataset feature-extraction ddc comofod

Languages

Language:Python 100.0%