dhyeyinf / ShieldNet

ML-powered Intrusion Detection System trained on ISCXIDS2012 dataset with multi-model comparison, reporting, and automation.

Repository from Github https://github.comdhyeyinf/ShieldNetRepository from Github https://github.comdhyeyinf/ShieldNet

🚨 ISCXIDS2012 - Network Intrusion Detection using Machine Learning

A comprehensive project that benchmarks multiple ML models to detect network intrusions using the ISCXIDS2012 dataset. The system processes day-wise traffic data and evaluates models like Random Forest, SVM, ExtraTrees, Gradient Boost, and more. The pipeline is fully automated and generates performance metrics for all models on all days.


πŸ“ Project Structure

ISCXIDS2012-MASTER/
β”œβ”€β”€ algorithms/              # Model implementations
β”œβ”€β”€ data/                    # Dataset (original, CSV, split versions)
β”‚   β”œβ”€β”€ original/            # Original CSVs from ISCXIDS2012
β”‚   β”œβ”€β”€ CSV/                 # Cleaned data
β”‚   └── split-CSV/           # 70-30 split data (train-test)
β”œβ”€β”€ plotting/                # Plotting scripts (optional)
β”œβ”€β”€ results/                 # Output metrics and result JSONs
β”‚   β”œβ”€β”€ single/              # Results from single run per model
β”‚   └── cv/                  # Results from cross-validation
β”œβ”€β”€ runner-scripts/          # Automation scripts
β”œβ”€β”€ ml.py                    # Entry point for training models
β”œβ”€β”€ result_handling.py       # Handles and stores model results
β”œβ”€β”€ extract_all_metrics.py   # Consolidates metrics from results
β”œβ”€β”€ preproc.py               # Preprocessing and data splitting
β”œβ”€β”€ reduction.py             # Feature selection/reduction
β”œβ”€β”€ run_all_model.sh         # Run every ML model on all 6 days
└── README.md                # Project readme

Technologies Used

  • Python 3.x
  • scikit-learn
  • pandas, numpy
  • matplotlib (for optional plotting)
  • Dataset: ISCXIDS2012

Setup Instruction

1. Clone the Repository

git clone https://github.com/dhyeyinf/ShieldNet.git
cd ShieldNet

2. Install Dependencies Make sure Python is installed (β‰₯3.7), then:

pip install -r requirements.txt

If requirements.txt is missing, you can manually install:

pip install scikit-learn pandas numpy matplotlib

Dataset Structure

The dataset contains six days of network traffic:

-D Day Filename Attack Scenario
0 Monday TestbedMonJun14Flows.csv HTTP DoS (Hulk)
1 Tuesday TestbedTueJun15Flows.csv DDoS via IRC Botnet
2 Wednesday TestbedWedJun16Flows.csv Brute Force
3 Thursday TestbedThuJun17Flows.csv SSH Brute Force
4 Saturday TestbedSatJun19Flows.csv Additional Brute Force Variants
5 Sunday TestbedSunJun20Flows.csv Infiltration Attack

Each file is split into 70% training and 30% testing data during preprocessing.

How to Run

1. Run All Models on All Days From the project root:

chmod +x runner-scripts/run_all_models.sh
./runner-scripts/run_all_models.sh

This will:

  • Run every ML model on all 6 days.
  • Save JSON results in results/single/<model_name>/

2. Extract Metrics into CSV

python extract_all_metrics.py

This will generate a consolidated CSV report with:

  • Accuracy
  • Precision
  • Recall
  • F1-score
  • ROC AUC
  • Runtime
  • Day (-D)
  • Model name

Available Models

The following models are evaluated in the project:

  • K-Nearest Neighbors (knn)
  • Nearest Centroid (ncentroid)
  • Decision Tree (dtree)
  • Linear SVM (linsvc)
  • RBF SVM (rbfsvc)
  • Random Forest (rforest)
  • AdaBoost (ada)
  • Bagging (bag)
  • Logistic Regression (binlr)
  • Quadratic Discriminant Analysis(qda)
  • Linear Discriminant Analysis(lda)
  • XGBoost (xgboost)
  • Gradient Boost (gradboost)
  • Extremely Randomised Trees (extratree)

Each model can also be run individually using:

python ml.py -D 0 -F rforest

Change -D to target day and -F to the model short-name.

Plotting (Optional)

If needed later:

python plotting/plot_single_metrics.py -D 0 -F rforest

Sample Output

After running, the results will be visible in:

results/single/<model_name>/<day>_<model>_Z_<%.json>

And the extracted final CSV report will summarize everything.

About

ML-powered Intrusion Detection System trained on ISCXIDS2012 dataset with multi-model comparison, reporting, and automation.


Languages

Language:Python 72.7%Language:Shell 27.3%