A comprehensive project that benchmarks multiple ML models to detect network intrusions using the ISCXIDS2012 dataset. The system processes day-wise traffic data and evaluates models like Random Forest, SVM, ExtraTrees, Gradient Boost, and more. The pipeline is fully automated and generates performance metrics for all models on all days.
ISCXIDS2012-MASTER/
βββ algorithms/ # Model implementations
βββ data/ # Dataset (original, CSV, split versions)
β βββ original/ # Original CSVs from ISCXIDS2012
β βββ CSV/ # Cleaned data
β βββ split-CSV/ # 70-30 split data (train-test)
βββ plotting/ # Plotting scripts (optional)
βββ results/ # Output metrics and result JSONs
β βββ single/ # Results from single run per model
β βββ cv/ # Results from cross-validation
βββ runner-scripts/ # Automation scripts
βββ ml.py # Entry point for training models
βββ result_handling.py # Handles and stores model results
βββ extract_all_metrics.py # Consolidates metrics from results
βββ preproc.py # Preprocessing and data splitting
βββ reduction.py # Feature selection/reduction
βββ run_all_model.sh # Run every ML model on all 6 days
βββ README.md # Project readmeTechnologies Used
- Python 3.x
scikit-learnpandas,numpymatplotlib(for optional plotting)- Dataset: ISCXIDS2012
1. Clone the Repository
git clone https://github.com/dhyeyinf/ShieldNet.git
cd ShieldNet2. Install Dependencies Make sure Python is installed (β₯3.7), then:
pip install -r requirements.txtIf requirements.txt is missing, you can manually install:
pip install scikit-learn pandas numpy matplotlibThe dataset contains six days of network traffic:
-D |
Day | Filename | Attack Scenario |
|---|---|---|---|
| 0 | Monday | TestbedMonJun14Flows.csv | HTTP DoS (Hulk) |
| 1 | Tuesday | TestbedTueJun15Flows.csv | DDoS via IRC Botnet |
| 2 | Wednesday | TestbedWedJun16Flows.csv | Brute Force |
| 3 | Thursday | TestbedThuJun17Flows.csv | SSH Brute Force |
| 4 | Saturday | TestbedSatJun19Flows.csv | Additional Brute Force Variants |
| 5 | Sunday | TestbedSunJun20Flows.csv | Infiltration Attack |
Each file is split into 70% training and 30% testing data during preprocessing.
1. Run All Models on All Days From the project root:
chmod +x runner-scripts/run_all_models.sh
./runner-scripts/run_all_models.shThis will:
- Run every ML model on all 6 days.
- Save JSON results in
results/single/<model_name>/
2. Extract Metrics into CSV
python extract_all_metrics.pyThis will generate a consolidated CSV report with:
- Accuracy
- Precision
- Recall
- F1-score
- ROC AUC
- Runtime
- Day (-D)
- Model name
The following models are evaluated in the project:
- K-Nearest Neighbors (
knn) - Nearest Centroid (
ncentroid) - Decision Tree (
dtree) - Linear SVM (
linsvc) - RBF SVM (
rbfsvc) - Random Forest (
rforest) - AdaBoost (
ada) - Bagging (
bag) - Logistic Regression (binlr)
- Quadratic Discriminant Analysis(
qda) - Linear Discriminant Analysis(
lda) - XGBoost (
xgboost) - Gradient Boost (
gradboost) - Extremely Randomised Trees (
extratree)
Each model can also be run individually using:
python ml.py -D 0 -F rforestChange -D to target day and -F to the model short-name.
If needed later:
python plotting/plot_single_metrics.py -D 0 -F rforestAfter running, the results will be visible in:
results/single/<model_name>/<day>_<model>_Z_<%.json>And the extracted final CSV report will summarize everything.