Masterthesis Code
This repository contains executions logs and scripts to reproduce the experiments from our masterthesis "Good || Evil: Defending Infrastructure at Scale with Anomaly and Classification Based Network Intrusion Detection"
- Link to paper: https://rp.os3.nl/2020-2021/p75/report.pdf
- Presentation (slightly outdated, TODO update): https://rp.os3.nl/2020-2021/p75/presentation.pdf
TODOs
- update presentation slides
- add CSV dataset files to git lfs
- add detailed step by step docs for experiment reproduction
Structure
The dnn folder contains the logs from the experiment execution of the deep neural network experiments, namely v1 and v2.
Each one has a folder that contains the model from the last execution, as well as folders for the tensorboard logs for each run, sequentially numbered.
The core experiment logic is in dnn/v1/experiment1.sh and dnn/v1/experiment2.sh respectively.
The screenlog.0 file contains the raw output from the experiment script execution, the file v1.csv and v2.csv files contain a generated spreadsheet for human analysis.
Step by step instructions are located in dnn/README.md.
CIC IDS 2018 Attack Descriptions
dnn/cic-ids2018-attacks.yml contains the attacks mapped onto the audit records.
Throughput measurement
dnn/pps-line.png is the line chart for the packet ingestion performance measured in packets per second when generating the connection audit records used in our experiments.
Connection Audit Record Generation
To generate and label the audit records for the experiments by processing the raw pcap files, the dnn/process-pcaps.sh script is used.
Please make sure to use at least netcap version v0.6.0 for the reproduction of our experiment results, and v0.6.1+ if you want to generate the throughput measurement chart automatically.
Pcap preprocessing
The pcaps for the CIC IDS 2018 dataset are provided for each day, as individual capture files per host. We merged these into a single pcap file for each day of the dataset.
After merging we noticed that some merged capture files contained traffic from multiple days, and therefore decided to clean them to ensure each file would only contain traffic for the desired day. The dnn/clean-days.sh script is used for this purpose.
Extract generated plots
The experiment code will generate several plots, to extract them all into a single directory the dnn/extract-plots.sh script can be used.
Experiments on Netflow data from the dataset authors
TODO: add step by step guide
Download and install anomaly tool: https://github.com/ppartarr/anomaly
Experiment scripts:
- runAudit.sh
- runExperiments.sh
- runNetflow.sh
LFS
Some files in this repository are too big for github and therefore provided via git large file storage extension.
To install the git lfs extension, use:
Apt/deb: sudo apt-get install git-lfs
Yum/rpm: sudo yum install git-lfs
MacOS: brew install git-lfs
Afterwards just clone the repository as usual:
git clone git@github.com:dreadl0ck/masterthesis.git