adversarials_detection (in development)
repository contains detectors of adversarial examples
to generate attack (FGSM) was used foolbox 2.4.0 attack.py
adversarials_detection.ipynb
demonstrates how to use adversarial detector
all experiments were performed using VGG architecture vgg.py
(to train model one could use cifar10training.py
)
Detector algorithms are in detectors.py
Adversarial examples
Original image:
The same image with small perturbations:
Algorithms
Softmax output of NN for the pictures above:
As we see, "probability" of real class in perturbed image is still significant. It could be
used to teach binary classificator (0 - real, 1 - adversarial) with softmax as features.