Adversarial DGA Dataset

A collection of different adversarial attacks on Domain Generated Algorithms' (DGA) classifiers

The goal of sharing the dataset is to help the research community to come with a new defense mechanism against DGA adversarial attacks.

Each attack type contains 10,000 samples for evaluation and 2,000 samples for adversarial training.

We implemented the attacks based on author papers or code snippets they send us privately.

Please quote the following papers for using this repository:

Lior Sidi, Asaf Nadler, and Asaf Shabtai. "MaskDGA: A black-box evasion technique against DGA classifiers and adversarial defenses."(2019).
Lior Sidi, Yisroel Mirsky, Asaf Nadler, Yuval Elovici and Asaf Shabtai. "Helix: DGA Domain Embeddings for Tracking and Exploring Botnets" (2020).

Attacks Data & Papers

DeepDGA: Anderson, H. S., Woodbridge, J., & Filar, B. (2016, October). DeepDGA: Adversarially-tuned domain generation and detection. In Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security (pp. 13-21). (paper / data)
CharBot: Peck, J., Nie, C., Sivaguru, R., Grumer, C., Olumofin, F., Yu, B., ... & De Cock, M. (2019). CharBot: A simple and effective method for evading DGA classifiers. IEEE Access, 7, 91759-91771. (paper / data)
MaskDGA: Sidi, L., Nadler, A., & Shabtai, A. (2019). MaskDGA: A black-box evasion technique against DGA classifiers and adversarial defenses. arXiv preprint arXiv:1902.08909. (paper / data)
Append: A sequential adversarial attack that appends characters one after the other. (data)
Search: An extension of MaskDGA, the attack keeps changing characters until the substitute model prediction changes. (data)
Random: Randomly change the characters of a generated domain.(data)

The dataset contains multiple files for each attacker's substitute model type (CNN/LSTM) and the threshold for character changes (25/50/75).

Each file contains more info on the attack, such as:

url_original: the original URL before the attack.
family_name: the original DGA family of the URL.
url: the URL after the attack.
sub_model_benign_prob: the attacker substitute model confidence for benign.
maskDGA_iteration: the model retraining phase, for evaluation, use the highest value (=4).

Available in this repository.

Not available in this repository - need contact each owner separately.

Not available in this repository - need contact each owner separately.

Defense	No Attack	Charbot	DeepDGA	MaskDGA
CNN (Invincea)	0.96	0.66	0.85	0.50
CNN (Invincea) + Distillation	0.96	0.66	0.79	0.49
CNN (Invincea) + Charbot retrain	0.93	0.64	0.68	0.50
CNN (Invincea) + DeepDGA retrain	0.92	0.60	0.97	0.51
CNN (Invincea) + MaskDGA retrain	0.92	0.62	0.72	0.95
Helix (AE Embeddings + KNN)	0.87	0.65	0.79	0.73

Architecture for Helix is available at: https://github.com/liorsidi/Helix