Anomaly Detection in Industrial IoT (IIoT)

INTRODUCTION

A significant stride has been made in the global information sector with the arrival of the Internet of Things (IoT), subsequent to the initial upswing of the Internet. This progress indicates the establishment of an intelligent network that facilitates the exchange of information and communication among devices through the internet. By utilizing the Internet of Things (IoT), individuals can accomplish a spectrum of goals, including tracking, monitoring, identifying, locating, and overseeing diverse entities. The realm of computer science has witnessed a surge in interest in IoT as a research focus, developing in tandem with both the Internet as well as the mobile devices. Evidently, the adoption of IoT devices has grown extensively across a variety of domains, such as Smart Healthcare, Intelligent Transportation, Advanced Governance, Precision Agriculture, Sophisticated Grid Systems, Automated Dwellings, and Efficient Supply Networks.

Dataset

The inquiry employed information obtained from the IoT-23 collection. Introduced to the public in January 2020, this collection includes nuanced information about the network communication patterns arising from 3 unique intelligent home IoT devices: first one bring Amazon Echo, second one being Philips HUE, and thierd one being Somfy Door Lock. It constitutes a comprehensive compilation comprising authenticated and labeled instances of both malevolent infections in IoT and non-malicious network behaviors. Tailored explicitly for the enhancement of machine learning algorithms, the collection includes 23 captures, also denoted as scenarios, containing 20 cases of malevolent captures and 3 instances of non-malicious captures. The captures linked to compromised gadgets unveil potential insights into the identity of the executed malicious software sample within each scenario. The groupings assigned to malevolent software in the IoT-23 collection span an array of sorts, including Attack, Command and Oversight (C&C), C&C File Access, C&C Pulse, C&C Pulse Attack, C&C Pulse File Access, C&C Mirai, C&C Torii, Distributed Denial of Service (DDoS), File Access, Okiru, Okiru Attack, and Section of a Horizontal Port Search. Moreover, the scrutiny of networks is executed through the use of Zeek, a specialized software crafted for network scrutiny. The assembly known as IoT-23 is structured in form of conn log labeled. This denotes Zeek conn log file generated by Zeek network analyzer, derived from original pcap document. Table I furnishes an exhaustive synopsis of variable classifications and their corresponding explanations within the IoT-23 compilation. Given the considerable magnitude of the compilation, a determination was made to selectively extract entries from each unique collection, amalgamating them to construct an innovative compilation. This strategy was chosen to streamline the computational handling of the new compilation while retaining the majority of the assault varieties documented in the IoT-23 collection.

Data Preprocessing

During the initial stage, we utilized the Pandas library in Python to bring in all 23 datasets from the IoT-23 Dataset, organizing them into distinct data formats. In this procedure, we excluded the first ten lines and analyzed the following set of 100,000 lines. Later on, we consolidated these 23 different data structures into a unified and singular data configuration. After this sequence of steps, we removed elements that did not influence the results, namely: temporal mark, user recognition, origin internet protocol (IP) location, source connection point, target IP location, target connection point, category of service, regional origin, local response, and historical details. Additionally, we assigned temporary values to the protocol and connection status variables, substituting any missing values with zero. In the end, the amalgamated dataset was created and saved as the IIoT combined.csv document. The resulting document, titled cleaned data.csv, includes a collective total of 48005 entries. Additionally, as per the details in above Figure, the amalgamated record involves 10 types of encroachments, specifically Horizontal Port Scan Component,Okiru Incursion, Distributed Denial of Service, Provocative Occurrence, Command and Control-Pulse, Command and Control-File Fetch, Command and Control Torii, Information Fetch, Command and Control-Pulse Information Fetch,and Command and Control-Mirai. To ensure the dependability of our conclusions, we partitioned the amalgamated dataset in training sector, covering 80% of data, and a testing sector, constituting the remaining 20%.

Results Comparison

The acquired outcomes for each of the techniques are juxtaposed, following which the evaluation will be conducted based on precision and the time expenditure for the execution of each technique. In the case of Naive Bayes, yielding an effectiveness of 0.23. Regarding Support Vector Mechanism, it culminates in an efficacy of 0.54, which is nearly equivalent to the Convolution Neural Network (CNN) pattern and around 6% beneath the accuracy of Decision Trees. Nevertheless, the temporal cost for Support Vector Mechanism amounts to approximately 2 minutes, signifying a pace 1,950 times more sluggish than Decision Trees and 24 times more gradual than the CNN model. In relation to the CNN model, it concludes with an efficacy of 0.88, which is inferior to Decision Trees and superior to Support Vector Mechanism. The temporal cost for the CNN model encompasses about 132 seconds, denoting a pace 80 times more sluggish than Decision Trees. As for Decision Trees, it yields an efficacy of 0.81 and incurs about 0.22 seconds, marking the apex accuracy and the minimal temporal expense among all the scrutinized Machine Learning/Deep Learning methodologies in this investigation. Furthermore, paper also scrutinized various Machination Intellection approaches on the IoT-23 dataset. Diverging from our investigation, document deployed Random Thicket (RF), Simpleminded Bayes, Backing Vector Mechanism, Artificial Neuronal Network (ANN), and AdaBoost. As disclosed in Table IX, the findings of document indicate that the Naive Bayes algorithm attains a 23 percent efficacy, while the Support Vector Mechanism attains a 67 percent efficacy. In comparison to our study, even though our outcomes exhibit a superior efficacy with the Simpleminded Bayes and Support Vector Mechanism algorithms, both outcomes manifest that the Simpleminded Bayes algorithm attains the least efficacy among all the algorithms. Nonetheless, considering the amalgamated dataset in is considerably more extensive compared to our combined dataset, this could potentially influence the outcomes. In summary, the comparison of results with document underscores the accuracy of our findings.

Anshukalathiya / anomaly_detection_iiot

Anomaly Detection in Industrial IoT (IIoT)

INTRODUCTION

Dataset

Data Preprocessing

Results Comparison

About

Languages