zyz135246 / Adversarial-DGA-Datasets

A collection of different adversarial attacks on Domain Generated Algorithms' (DGA) classifiers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Adversarial DGA Dataset

A collection of different adversarial attacks on Domain Generated Algorithms' (DGA) classifiers

The goal of sharing the dataset is to help the research community to come with a new defense mechanism against DGA adversarial attacks.

Each attack type contains 10,000 samples for evaluation and 2,000 samples for adversarial training.

We implemented the attacks based on author papers or code snippets they send us privately.

Please quote the following papers for using this repository:

  • Lior Sidi, Asaf Nadler, and Asaf Shabtai. "MaskDGA: A black-box evasion technique against DGA classifiers and adversarial defenses."(2019).
  • Lior Sidi, Yisroel Mirsky, Asaf Nadler, Yuval Elovici and Asaf Shabtai. "Helix: DGA Domain Embeddings for Tracking and Exploring Botnets" (2020).

Attacks Data & Papers

  • DeepDGA: Anderson, H. S., Woodbridge, J., & Filar, B. (2016, October). DeepDGA: Adversarially-tuned domain generation and detection. In Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security (pp. 13-21). (paper / data)
  • CharBot: Peck, J., Nie, C., Sivaguru, R., Grumer, C., Olumofin, F., Yu, B., ... & De Cock, M. (2019). CharBot: A simple and effective method for evading DGA classifiers. IEEE Access, 7, 91759-91771. (paper / data)
  • MaskDGA: Sidi, L., Nadler, A., & Shabtai, A. (2019). MaskDGA: A black-box evasion technique against DGA classifiers and adversarial defenses. arXiv preprint arXiv:1902.08909. (paper / data)
  • Append: A sequential adversarial attack that appends characters one after the other. (data)
  • Search: An extension of MaskDGA, the attack keeps changing characters until the substitute model prediction changes. (data)
  • Random: Randomly change the characters of a generated domain.(data)

More about the MaskDGA datasets

The dataset contains multiple files for each attacker's substitute model type (CNN/LSTM) and the threshold for character changes (25/50/75).

Each file contains more info on the attack, such as:

  • url_original: the original URL before the attack.
  • family_name: the original DGA family of the URL.
  • url: the URL after the attack.
  • sub_model_benign_prob: the attacker substitute model confidence for benign.
  • maskDGA_iteration: the model retraining phase, for evaluation, use the highest value (=4).

Datasets unigram charachters distribution

Adversarial attacks datasets

Available in this repository.

DeepDGA CharBot MaskDGA_CNN_25 MaskDGA_CNN_50 MaskDGA_CNN_75 MaskDGA_LSTM_50 AppendAttack SearchAttack Random

Benign datasets

Not available in this repository - need contact each owner separately.

Alexa1Mil AmeritaDGA_benign ISP_Benign

DGA datasets

Not available in this repository - need contact each owner separately.

AmeritaDGA_DGA DGA_Archive ISP_DGA

Attacks evasion performance:

Defense No Attack Charbot DeepDGA MaskDGA
CNN (Invincea) 0.96 0.66 0.85 0.50
CNN (Invincea) + Distillation 0.96 0.66 0.79 0.49
CNN (Invincea) + Charbot retrain 0.93 0.64 0.68 0.50
CNN (Invincea) + DeepDGA retrain 0.92 0.60 0.97 0.51
CNN (Invincea) + MaskDGA retrain 0.92 0.62 0.72 0.95
Helix (AE Embeddings + KNN) 0.87 0.65 0.79 0.73

Architecture for Helix is available at: https://github.com/liorsidi/Helix

About

A collection of different adversarial attacks on Domain Generated Algorithms' (DGA) classifiers