Detecting-Phishing-Attack-using-ML-DL-Models

Developed a model to detect Phished emails from legitimate ones using the Spam Assassin dataset. Extracted relevant features by processing the mails using the NLP toolkit. Built various ML models like Naïve Bayes, Random Forest, and Voting Ensemble with the best accuracy of ~72%, and deep learning model like Neural Network with an accuracy of ~96%.

Overview

Phishing is when cybercriminals send malicious emails designed to trick people into falling for a scam. The intent is often to get users to reveal financial information, system credentials, or other sensitive data. The term “Phishing” came about in mid-1990’s, when hackers began using fraudulent emails to fish for information from unsuspecting users. Cybercriminals use phishing because it’s easy, cheap and effective. Email addresses are easy to obtain and emails are virtually free to send. With little effort and little cost, attackers can quickly gain access to valuable data. We can detect these emails and detect them as spam and reduce these attacks. To do this we can use various machine learning and deep learning models.

Phishector Architecture

Email Dataset

An experiment is conducted in order to identify the input/output behavior of the system. We have collected data from 2 different datasets. The datasets are SpamAssassin and spam/ham. These datasets are open-source and are freely available. The dataset collected in the experiment are identified and given in Table 4.1. Below table shows the total count of dataset and number of phished and legitimate emails present in those datasets which we have further used to train our model.

Implementation

Accessing the .py file and running Phishector code.
Entering the path to the folder consisting of emails.
Menu choice available to the user.
Choosing choice 1 leads to the extracted features of the emails.
Choosing choice 2 provides classification using Deep learning ie Neural network.
Choosing choice 3 provides ML models menu.
Choice 3 in ML models menu provides classification using Extra trees model.
Choice 4 & 5 in ML models menu provides classification using Adaboost and Stochastic Gradient Boosting model respectively.
Choice 6 & 7 in ML models menu provides classification using Voting Ensemble and Naive Bayes model respectively.
Choice 8 in ML models menu provides classification using SVM model and choosing option 9 in ML models menu will EXIT the internal menu and go back to Main Menu.

Evaluation Metrics

Graph plot of evaluation metrics vs score for different ML models on SpamAssassin dataset.
Graph plot of evaluation metrics vs score for different ML models on HSD dataset.

Result Analysis

Graph plot of Machine Learning models vs Accuracy for SpamAssassin dataset.
Graph plot of Machine Learning Models vs Accuracy for HSD dataset.

About

Developed a model to detect Phished emails from legitimate ones using the Spam Assassin dataset. Extracted relevant features by processing the mails using the NLP toolkit. Built various ML models like Naïve Bayes, Random Forest, and Voting Ensemble with the best accuracy of ~72%, and deep learning model like Neural Network with an accuracy of ~

Languages

Language:Python 80.5%Language:Roff 19.5%