TrojAI Literature Review

The list below contains curated papers and arXiv articles that are related to Trojan attacks, backdoor attacks, and data poisoning on neural networks and machine learning systems. They are ordered "approximately" from most to least recent and articles denoted with a "*" mention the TrojAI program directly. Some of the particularly relevant papers include a summary that can be accessed by clicking the "Summary" drop down icon underneath the paper link. These articles were identified using variety of methods including:

A flair embedding created from the arXiv CS subset; details will be provided later.
A trained ASReview random forest model
A curated manual literature review

A Feature Based On-Line Detector to Remove Adversarial-Backdoors by Iterative Demarcation
BlindNet backdoor: Attack on deep neural network using blind watermark
DBIA: Data-free Backdoor Injection Attack against Transformer Networks
Backdoor Attack through Frequency Domain
NTD: Non-Transferability Enabled Backdoor Detection
Romoa: Robust Model Aggregation for the Resistance of Federated Learning to Model Poisoning Attacks
Generative strategy based backdoor attacks to 3D point clouds: Work in Progress
Deep Neural Backdoor in Semi-Supervised Learning: Threats and Countermeasures
FooBaR: Fault Fooling Backdoor Attack on Neural Network Training
BFClass: A Backdoor-free Text Classification Framework
Backdoor Attacks on Federated Learning with Lottery Ticket Hypothesis
Data Poisoning against Differentially-Private Learners: Attacks and Defenses
DOES DIFFERENTIAL PRIVACY DEFEAT DATA POISONING?
Check Your Other Door! Establishing Backdoor Attacks in the Frequency Domain
HaS-Nets: A Heal and Select Mechanism to Defend DNNs Against Backdoor Attacks for Data Collection Scenarios
SanitAIs: Unsupervised Data Augmentation to Sanitize Trojaned Neural Networks
COVID-19 Diagnosis from Chest X-Ray Images Using Convolutional Neural Networks and Effects of Data Poisoning
Interpretability-Guided Defense against Backdoor Attacks to Deep Neural Networks
Trojan Signatures in DNN Weights
HOW TO INJECT BACKDOORS WITH BETTER CONSISTENCY: LOGIT ANCHORING ON CLEAN DATA
A Synergetic Attack against Neural Network Classifiers combining Backdoor and Adversarial Examples
Backdoor Attack and Defense for Deep Regression
Use Procedural Noise to Achieve Backdoor Attack
Excess Capacity and Backdoor Poisoning
BatFL: Backdoor Detection on Federated Learning in e-Health
Poisonous Label Attack: Black-Box Data Poisoning Attack with Enhanced Conditional DCGAN
Backdoor Attacks on Network Certification via Data Poisoning
Identifying Physically Realizable Triggers for Backdoored Face Recognition Networks
Simtrojan: Stealthy Backdoor Attack
Back to the Drawing Board: A Critical Evaluation of Poisoning Attacks on Federated Learning
Quantization Backdoors to Deep Learning Models
Multi-Target Invisibly Trojaned Networks for Visual Recognition and Detection
A Countermeasure Method Using Poisonous Data Against Poisoning Attacks on IoT Machine Learning
FederatedReverse: A Detection and Defense Method Against Backdoor Attacks in Federated Learning
Accumulative Poisoning Attacks on Real-time Data
Inaudible Manipulation of Voice-Enabled Devices Through BackDoor Using Robust Adversarial Audio Attacks
Stealthy Targeted Data Poisoning Attack on Knowledge Graphs
BinarizedAttack: Structural Poisoning Attacks to Graph-based Anomaly Detection
On the Effectiveness of Poisoning against Unsupervised Domain Adaptation
Simple, Attack-Agnostic Defense Against Targeted Training Set Attacks Using Cosine Similarity
Data Poisoning Attacks Against Outcome Interpretations of Predictive Models
BDDR: An Effective Defense Against Textual Backdoor Attacks
Poisoning attacks and countermeasures in intelligent networks: status quo and prospects
The Devil is in the GAN: Defending Deep Generative Models Against Backdoor Attacks
BadEncoder: Backdoor Attacks to Pre-trainedEncoders in Self-Supervised Learning
BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised Learning
Can You Hear It? Backdoor Attacks via Ultrasonic Triggers
Poisoning Attacks via Generative Adversarial Text to Image Synthesis
Ant Hole: Data Poisoning Attack Breaking out the Boundary of Face Cluster
Poison Ink: Robust and Invisible Backdoor Attack
MT-MTD: Muti-Training based Moving Target Defense Trojaning Attack in Edged-AI network
Text Backdoor Detection Using An Interpretable RNN Abstract Model
Garbage in, Garbage out: Poisoning Attacks Disguised with Plausible Mobility in Data Aggregation
Classification Auto-Encoder based Detector against Diverse Data Poisoning Attacks
Poisoning Knowledge Graph Embeddings via Relation Inference Patterns
Adversarial Training Time Attack Against Discriminative and Generative Convolutional Models
Poisoning of Online Learning Filters: DDoS Attacks and Countermeasures
Rethinking Stealthiness of Backdoor Attack against NLP Models
Robust Learning for Data Poisoning Attacks
SPECTRE: Defending Against Backdoor Attacks Using Robust Statistics
Poisoning the Search Space in Neural Architecture Search
Data Poisoning Won’t Save You From Facial Recognition
Accumulative Poisoning Attacks on Real-time Data
Backdoor Attack on Machine Learning Based Android Malware Detectors
Understanding the Limits of Unsupervised Domain Adaptation via Data Poisoning
Indirect Invisible Poisoning Attacks on Domain Adaptation
Fight Fire with Fire: Towards Robust Recommender Systems via Adversarial Poisoning Training
Putting words into the system’s mouth: A targeted attack on neural machine translation using monolingual data poisoning
SUBNET REPLACEMENT: DEPLOYMENT-STAGE BACKDOOR ATTACK AGAINST DEEP NEURAL NETWORKS IN GRAY-BOX SETTING
Spinning Sequence-to-Sequence Models with Meta-Backdoors
Sleeper Agent: Scalable Hidden Trigger Backdoors for Neural Networks Trained from Scratch
Poisoning and Backdooring Contrastive Learning
AdvDoor: Adversarial Backdoor Attack of Deep Learning System
Defending against Backdoor Attacks in Natural Language Generation
De-Pois: An Attack-Agnostic Defense against Data Poisoning Attacks
Poisoning MorphNet for Clean-Label Backdoor Attack to Point Clouds
Provable Guarantees against Data Poisoning Using Self-Expansion and Compatibility
MLDS: A Dataset for Weight-Space Analysis of Neural Networks
Poisoning the Unlabeled Dataset of Semi-Supervised Learning
Regularization Can Help Mitigate Poisioning Attacks. . . With The Right Hyperparameters
Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching
Towards Robustness Against Natural Language Word Substitutions
Concealed Data Poisoning Attacks on NLP Models
Covert Channel Attack to Federated Learning Systems
Backdoor Attacks Against Deep Learning Systems in the Physical World
Backdoor Attacks on Self-Supervised Learning
Transferable Environment Poisoning: Training-time Attack on Reinforcement Learning
Investigation of a differential cryptanalysis inspired approach for Trojan AI detection
Explanation-Guided Backdoor Poisoning Attacks Against Malware Classifiers
Robust Backdoor Attacks against Deep Neural Networks in Real Physical World
The Design and Development of a Game to Study Backdoor Poisoning Attacks: The Backdoor Game
A Backdoor Attack against 3D Point Cloud Classifiers
Explainability-based Backdoor Attacks Against Graph Neural Networks
DeepSweep: An Evaluation Framework for Mitigating DNN Backdoor Attacks using Data Augmentation
Rethinking the Backdoor Attacks' Triggers: A Frequency Perspective
PointBA: Towards Backdoor Attacks in 3D Point Cloud
Online Defense of Trojaned Models using Misattributions
Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models
SPECTRE: Defending Against Backdoor Attacks Using Robust Covariance Estimation
Black-box Detection of Backdoor Attacks with Limited Information and Data
TOP: Backdoor Detection in Neural Networks via Transferability of Perturbation
T-Miner : A Generative Approach to Defend Against Trojan Attacks on DNN-based Text Classification
Hidden Backdoor Attack against Semantic Segmentation Models
What Doesn't Kill You Makes You Robust(er): Adversarial Training against Poisons and Backdoors
Red Alarm for Pre-trained Models: Universal Vulnerabilities by Neuron-Level Backdoor Attacks
Provable Defense Against Delusive Poisoning
An Approach for Poisoning Attacks Against RNN-Based Cyber Anomaly Detection
Backdoor Scanning for Deep Neural Networks through K-Arm Optimization
TAD: Trigger Approximation based Black-box Trojan Detection for AI*
WaNet - Imperceptible Warping-based Backdoor Attack
Data Poisoning Attack on Deep Neural Network and Some Defense Methods
Baseline Pruning-Based Approach to Trojan Detection in Neural Networks*
Covert Model Poisoning Against Federated Learning: Algorithm Design and Optimization
Property Inference from Poisoning
TROJANZOO: Everything you ever wanted to know about neural backdoors (but were afraid to ask)
A Master Key Backdoor for Universal Impersonation Attack against DNN-based Face Verification
Detecting Universal Trigger's Adversarial Attack with Honeypot
ONION: A Simple and Effective Defense Against Textual Backdoor Attacks
Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks
Data Poisoning Attacks to Deep Learning Based Recommender Systems
Backdoors hidden in facial features: a novel invisible backdoor attack against face recognition systems
One-to-N & N-to-One: Two Advanced Backdoor Attacks against Deep Learning Models
DeepPoison: Feature Transfer Based Stealthy Poisoning Attack
Policy Teaching via Environment Poisoning:Training-time Adversarial Attacks against Reinforcement Learning
Composite Backdoor Attack for Deep Neural Network by Mixing Existing Benign Features
SPA: Stealthy Poisoning Attack
Backdoor Attack with Sample-Specific Triggers
Explainability Matters: Backdoor Attacks on Medical Imaging
Escaping Backdoor Attack Detection of Deep Learning
Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks
Poisoning Attacks on Cyber Attack Detectors for Industrial Control Systems
Fair Detection of Poisoning Attacks in Federated Learning
Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification*
Stealthy Poisoning Attack on Certified Robustness
Machine Learning with Electronic Health Records is vulnerable to Backdoor Trigger Attacks
Data Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses
Detection of Backdoors in Trained Classifiers Without Access to the Training Set
TROJANZOO: Everything you ever wanted to know about neural backdoors(but were afraid to ask)
HaS-Nets: A Heal and Select Mechanism to Defend DNNs Against Backdoor Attacks for Data Collection Scenarios
DeepSweep: An Evaluation Framework for Mitigating DNN Backdoor Attacks using Data Augmentation
Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder
Strong Data Augmentation Sanitizes Poisoning and Backdoor Attacks Without an Accuracy Tradeoff
BaFFLe: Backdoor detection via Feedback-based Federated Learning
Detecting Backdoors in Neural Networks Using Novel Feature-Based Anomaly Detection
Mitigating Backdoor Attacks in Federated Learning
FaceHack: Triggering backdoored facial recognition systems using facial characteristics
Customizing Triggers with Concealed Data Poisoning
Backdoor Learning: A Survey
Rethinking the Trigger of Backdoor Attack
AEGIS: Exposing Backdoors in Robust Machine Learning Models
Weight Poisoning Attacks on Pre-trained Models
Poisoned classifiers are not only backdoored, they are fundamentally broken
Input-Aware Dynamic Backdoor Attack
Reverse Engineering Imperceptible Backdoor Attacks on Deep Neural Networks for Detection and Training Set Cleansing
BAAAN: Backdoor Attacks Against Autoencoder and GAN-Based Machine Learning Models
Don’t Trigger Me! A Triggerless Backdoor Attack Against Deep Neural Networks
Toward Robustness and Privacy in Federated Learning: Experimenting with Local and Central Differential Privacy
CLEANN: Accelerated Trojan Shield for Embedded Neural Networks
Witches’ Brew: Industrial Scale Data Poisoning via Gradient Matching
Intrinsic Certified Robustness of Bagging against Data Poisoning Attacks
Can Adversarial Weight Perturbations Inject Neural Backdoors?
Trojaning Language Models for Fun and Profit
Practical Detection of Trojan Neural Networks: Data-Limited and Data-Free Cases
Class-Oriented Poisoning Attack
Noise-response Analysis for Rapid Detection of Backdoors in Deep Neural Networks
Cassandra: Detecting Trojaned Networks from Adversarial Perturbations
Backdoor Learning: A Survey
Backdoor Attacks and Countermeasures on Deep Learning: A Comprehensive Review
Live Trojan Attacks on Deep Neural Networks
Odyssey: Creation, Analysis and Detection of Trojan Models
Data Poisoning Attacks Against Federated Learning Systems
Blind Backdoors in Deep Learning Models
Deep Learning Backdoors
Attack of the Tails: Yes, You Really Can Backdoor Federated Learning
Backdoor Attacks on Facial Recognition in the Physical World
Graph Backdoor
Backdoor Attacks to Graph Neural Networks
You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion
Reflection Backdoor: A Natural Backdoor Attack on Deep Neural Networks
Trembling triggers: exploring the sensitivity of backdoors in DNN-based face recognition
Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks
Adversarial Machine Learning -- Industry Perspectives
ConFoc: Content-Focus Protection Against Trojan Attacks on Neural Networks
Model-Targeted Poisoning Attacks: Provable Convergence and Certified Bounds
Deep Partition Aggregation: Provable Defense against General Poisoning Attacks
The TrojAI Software Framework: An OpenSource tool for Embedding Trojans into Deep Learning Models*
Influence Function based Data Poisoning Attacks to Top-N Recommender Systems
BadNL: Backdoor Attacks Against NLP Models
Summary
- Introduces first example of backdoor attacks against NLP models using Char-level, Word-level, and Sentence-level triggers (these different triggers operate on the level of their descriptor)
  - Word-level trigger picks a word from the target model’s dictionary and uses it as a trigger
  - Char-level trigger uses insertion, deletion or replacement to modify a single character in a chosen word’s location (with respect to the sentence, for instance, at the start of each sentence) as the trigger.
  - Sentence-level trigger changes the grammar of the sentence and use this as the trigger
- Authors impose an additional constraint that requires inserted triggers to not change the sentiment of text input
- Proposed backdoor attack achieves 100% backdoor accuracy with only a drop of 0.18%, 1.26%, and 0.19% in the models utility, for the IMDB, Amazon, and Stanford Sentiment Treebank datasets
Neural Network Calculator for Designing Trojan Detectors*
Dynamic Backdoor Attacks Against Machine Learning Models
Vulnerabilities of Connectionist AI Applications: Evaluation and Defence
Backdoor Attacks on Federated Meta-Learning
Defending Support Vector Machines against Poisoning Attacks: the Hardness and Algorithm
Backdoors in Neural Models of Source Code
A new measure for overfitting and its implications for backdooring of deep learning
An Embarrassingly Simple Approach for Trojan Attack in Deep Neural Networks
MetaPoison: Practical General-purpose Clean-label Data Poisoning
Backdooring and Poisoning Neural Networks with Image-Scaling Attacks
Bullseye Polytope: A Scalable Clean-Label Poisoning Attack with Improved Transferability
On the Effectiveness of Mitigating Data Poisoning Attacks with Gradient Shaping
A Survey on Neural Trojans
STRIP: A Defence Against Trojan Attacks on Deep Neural Networks
Summary
- Authors introduce a run-time based trojan detection system called STRIP or STRong Intentional Pertubation which focuses on models in computer vision
- STRIP works by intentionally perturbing incoming inputs (ie. by image blending) and then measuring entropy to determine whether the model is trojaned or not. Low entropy violates the input-dependance assumption for a clean model and thus indicates corruption
- Authors validate STRIPs efficacy on MNIST,CIFAR10, and GTSRB acheiveing false acceptance rates of below 1%
TrojDRL: Trojan Attacks on Deep Reinforcement Learning Agents
Demon in the Variant: Statistical Analysis of DNNs for Robust Backdoor Contamination Detection
Regula Sub-rosa: Latent Backdoor Attacks on Deep Neural Networks
Februus: Input Purification Defense Against Trojan Attacks on Deep Neural Network Systems
TBT: Targeted Neural Network Attack with Bit Trojan
Bypassing Backdoor Detection Algorithms in Deep Learning
A backdoor attack against LSTM-based text classification systems
Invisible Backdoor Attacks Against Deep Neural Networks
Detecting AI Trojans Using Meta Neural Analysis
Label-Consistent Backdoor Attacks
Detection of Backdoors in Trained Classifiers Without Access to the Training Set
ABS: Scanning neural networks for back-doors by artificial brain stimulation
NeuronInspect: Detecting Backdoors in Neural Networks via Output Explanations
Universal Litmus Patterns: Revealing Backdoor Attacks in CNNs
Programmable Neural Network Trojan for Pre-Trained Feature Extractor
Demon in the Variant: Statistical Analysis of DNNs for Robust Backdoor Contamination Detection
TamperNN: Efficient Tampering Detection of Deployed Neural Nets
TABOR: A Highly Accurate Approach to Inspecting and Restoring Trojan Backdoors in AI Systems
Design of intentional backdoors in sequential models
Design and Evaluation of a Multi-Domain Trojan Detection Method on ins Neural Networks
Poison as a Cure: Detecting & Neutralizing Variable-Sized Backdoor Attacks in Deep Neural Networks
Data Poisoning Attacks on Stochastic Bandits
Hidden Trigger Backdoor Attacks
Deep Poisoning Functions: Towards Robust Privacy-safe Image Data Sharing
A new Backdoor Attack in CNNs by training set corruption without label poisoning
Deep k-NN Defense against Clean-label Data Poisoning Attacks
Transferable Clean-Label Poisoning Attacks on Deep Neural Nets
Revealing Backdoors, Post-Training, in DNN Classifiers via Novel Inference on Optimized Perturbations Inducing Group Misclassification
Explaining Vulnerabilities to Adversarial Machine Learning through Visual Analytics
Subpopulation Data Poisoning Attacks
TensorClog: An imperceptible poisoning attack on deep neural network applications
DeepInspect: A black-box trojan detection and mitigation framework for deep neural networks
Resilience of Pruned Neural Network Against Poisoning Attack
Spectrum Data Poisoning with Adversarial Deep Learning
Neural cleanse: Identifying and mitigating backdoor attacks in neural networks
SentiNet: Detecting Localized Universal Attacks Against Deep Learning Systems
Summary
- Authors develop SentiNet detection framework for locating universal attacks on neural networks
- SentiNet is ambivalent to the attack vectors and uses model visualization / object detection techniques to extract potential attacks regions from the models input images. The potential attacks regions are identified as being the parts that influence the prediction the most. After extraction, SentiNet applies these regions to benign inputs and uses the original model to analyze the output
- Authors stress test the SentiNet framework on three different types of attacks— data poisoning attacks, Trojan attacks, and adversarial patches. They are able to show that the framework achieves competitive metrics across all of the attacks (average true positive rate of 96.22% and an average true negative rate of 95.36%)
PoTrojan: powerful neural-level trojan designs in deep learning models
Hardware Trojan Attacks on Neural Networks
Spectral Signatures in Backdoor Attacks
Summary
- Identified a "spectral signatures" property of current backdoor attacks which allows the authors to use robust statistics to stop Trojan attacks
- The "spectral signature" refers to a change in the covariance spectrum of learned feature representations that is left after a network is attacked. This can be detected by using singular value decomposition (SVD). SVD is used to identify which examples to remove from the training set. After these examples are removed the model is retrained on the cleaned dataset and is no longer Trojaned. The authors test this method on the CIFAR 10 image dataset.
Defending Neural Backdoors via Generative Distribution Modeling
Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering
Summary
- Proposes Activation Clustering approach to backdoor detection/ removal which analyzes the neural network activations for anomalies and works for both text and images
- Activation Clustering uses dimensionality techniques (ICA, PCA) on the activations and then clusters them using k-means (k=2) along with a silhouette score metric to separate poisoned from clean clusters
- Shows that Activation Clustering is successful on three different image/datasets (MNIST, LISA, Rotten Tomatoes) as well as in settings where multiple Trojans are inserted and classes are multi-modal
Model-Reuse Attacks on Deep Learning Systems
How To Backdoor Federated Learning
Trojaning Attack on Neural Networks
Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks
Summary
- Proposes neural network poisoning attack that uses "clean labels" which do not require the adversary to mislabel training inputs
- The paper also presents a optimization based method for generating their poisoning attacks and provides a watermarking strategy for end-to-end attacks that improves the poisoning reliability
- Authors demonstrate their method by using generated poisoned frog images from the CIFAR dataset to manipulate different kinds of image classifiers
Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks
Summary
- Investigate two potential detection methods for backdoor attacks (Fine-tuning and pruning). They find both are insufficient on their own and thus propose a combined detection method which they call "Fine-Pruning"
- Authors go on to show that on three backdoor techniques "Fine-Pruning" is able to eliminate or reduce Trojans on datasets in the traffic sign, speech, and face recognition domains
Technical Report: When Does Machine Learning FAIL? Generalized Transferability for Evasion and Poisoning Attacks
Backdoor Embedding in Convolutional Neural Network Models via Invisible Perturbation
Hu-Fu: Hardware and Software Collaborative Attack Framework against Neural Networks
Attack Strength vs. Detectability Dilemma in Adversarial Machine Learning
Data Poisoning Attacks in Contextual Bandits
BEBP: An Poisoning Method Against Machine Learning Based IDSs
Generative Poisoning Attack Method Against Neural Networks
BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain
Summary
- Introduce Trojan Attacks— a type of attack where an adversary can create a maliciously trained network (a backdoored neural network, or a BadNet) that has state-of-the-art performance on the user’s training and validation samples, but behaves badly on specific attacker-chosen inputs
- Demonstrate backdoors in a more realistic scenario by creating a U.S. street sign classifier that identifies stop signs as speed limits when a special sticker is added to the stop sign
Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization
Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning
Neural Trojans
Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization
Certified defenses for data poisoning attacks
Data Poisoning Attacks on Factorization-Based Collaborative Filtering
Data poisoning attacks against autoregressive models
Using machine teaching to identify optimal training-set attacks on machine learners
Poisoning Attacks against Support Vector Machines
Backdoor Attacks against Learning Systems
Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning
Antidote: Understanding and defending against poisoning of anomaly detectors

munhouiani / trojai-literature

TrojAI Literature Review

About