davidit33 / SCADA-ML

Github PK Tool:Github PK Tool

Repository from Github https://github.com/davidit33/SCADA-MLRepository from Github https://github.com/davidit33/SCADA-ML

SCADA-ML's ContributorsJanFebMarAprMayJunJulAugSepOctNovDecJanSunMonTueWedThuFriSat

Replicating "SCADA Project System Testbed for Cybersecurity Research Using Machine Learning Approach"

As part of my studies πŸ§‘β€πŸŽ“, I tried to replicate the same experiment shown in this article this article , titled "SCADA System Testbed for Cybersecurity Research Using Machine Learning Approach", and published in 2018.

The results obtained in my Notebook were very similar to those of the researchers πŸ’ͺ, for: the algorithms: Decision Tree, Random Forest and Logistic Regression, however the results for Naives Bayes are not included because the accuracy obtained was much lower, I do not understand as in the article such a high accuracy was achieved using this last algorithm πŸ€·β€β™‚οΈ, I also do not understand how the KNN algorithm was used in a dataset without previously applying feature scaling πŸ€·β€β™‚οΈ.

It is mentioned in the article that "As shown in Table 5, our dataset is unbalanced. Therefore, accuracy is not the ideal measure to evaluate performance [33]", which I totally agree, that's why in my notebook I used "f1_score" as a metric to evaluate the accuracy of my model, however in Figure 8, it is not clear to me if an evaluation metric was used that take into account the imbalance in the dataset. πŸ€·β€β™‚οΈ One of the questions I get is: Why did the model perform so well πŸ€·β€β™‚οΈ?

πŸ‘¨β€πŸ’» You can see the notebook here: Kaggle

Please contact me and help me learn ⭐, give me your conclusions about this Notebook.

About


Languages

Language:Jupyter Notebook 100.0%