FeiSun / Robust-QA

Robust QA: attack, defense, robust

Robust-QA

Robust QA: attack, defense, robust

QA attack at inference stage

Adversarial Examples for Evaluating Reading Comprehension Systems

Reasoning Chain Based Adversarial Attack for Multi-hop Question Answering

T3: Tree-Autoencoder Regularized Adversarial Text Generation for Targeted Attack

VQA attack at training stage

Dual-Key Multimodal Backdoors for Visual Question Answering

NLP attack at training stage

BadNL: Backdoor Attacks Against NLP Models

Rethinking Stealthiness of Backdoor Attack against NLP Models

Concealed Data Poisoning Attacks on NLP Models

Weight Poisoning Attacks on Pre-trained Models

Defense agnist NLP backdoo

ONION: A Simple and Effective Defense Against Textual Backdoor Attacks

CSCI 699 course

THIEVES ON SESAME STREET! MODEL EXTRACTION OF BERT-BASED APIS [model steadling]

Imitation Attacks and Defenses for Black-box Machine Translation Systems [model steadling]

ACL backdoor in NLP

Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution

Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger

EMNLP backdoor in NLP

Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning

RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models

Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer

ONION: A Simple and Effective Defense Against Textual Backdoor Attacks

NAACL backdoor in NLP

Triggerless Backdoor Attack for NLP Tasks with Clean Label

AAAI backdoor in NLP

Hard to Forget: Poisoning Attacks on Certified Machine Unlearning

Backdoor Attacks on the DNN Interpretation System

Intrinsic Certified Robustness of Bagging against Data Poisoning Attacks

DeHiB: Deep Hidden Backdoor Attack on Semi-supervised Learning via Adversarial Perturbation

Hidden Trigger Backdoor Attacks

ICLR backdoor in NLP

POISONING AND BACKDOORING CONTRASTIVE LEARNING by Google

HOW TO INJECT BACKDOORS WITH BETTER CONSISTENCY: LOGIT ANCHORING ON CLEAN DATA

TRIGGER HUNTING WITH A TOPOLOGICAL PRIOR FOR TROJAN DETECTION

Useful Repos

Backdoor on generative model

Adversarial Attacks Against Deep Generative Models on Data: A Survey Poisoning Attack on Deep Generative Models in Autonomous Driving

Model Editing

Calibrating Factual Knowledge in Pretrained Language Models EMNLP 2022

EDITABLE NEURAL NETWORKS ICLR 2020

Editing a Classifier by Rewriting Its Prediction Rules Neurips 2021

Editing Factual Knowledge in Language Models EMNLP 2021

Fast Model Editing at Scale ICLR 2022

Locating and Editing Factual Associations in GPT

Memory-Based Model Editing at Scale PMLR 2022

Modifying Memories in Transformer Models

Prompt-based model editing

Explanable NLP

survey:

Trustworthy AI: A Computational Perspective

A Survey of the State of Explainable AI for Natural Language Processing:介绍nlp中常用方法

papers

Learning Global Transparent Models Consistent with Local Contrastive Explanations

Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations

On Guaranteed Optimal Robust Explanations for NLP Models

A Comparative Study of Faithfulness Metrics for Model Interpretability Methods : 评价解释方法的faithful

About

Robust QA: attack, defense, robust