Phishing-Detection

Precise phishing detection with recurrent convolution neural network

This model is coming to based on the reaserch paper "PDRCNN: Precise Phishing Detection with Recurrent Convolutional Neural Networks" by Weiping Wang, Feng Zhang, Xi Luo and Shigeng Zhang , the reaserch paper can be found at https://www.hindawi.com/journals/scn/2019/2595794/.

How it works

Set the lenght of the URL to 255
Each character is encoded to a 60 bit 01 string [ 26 English letters, 10 Arabic numbers, 23 Special characters, and other characters that are not in the list]
Use the word2vec method in natural language to encode the previous 60-bit 01 string to a 64-bit word vector.
Recurrent strucutre extracts the features in the URL by bidirectional LSTM neural network.
The local features in the tensors are extracted by the multilayer convolution layer. [ three types of convolutional kernals 5* 120,6* 120,7*120 ]
Use maxi-pooling to activate the features generated by the convolutional layer, extracts the most representative features of the URL, and splices the results of the convolution and pooling of the three types of convolution kernels together to form the final feature vector.
use the fully connection layer and the sigmoid function to distinguish the URL into the benign and the phishing webiste.

Dataset

PhishTank website, Alexa website

TODO

Step 3 - Step 7

About

Precise phishing detection with recurrent convolution neural network

phishing-detection

Languages

Language:Jupyter Notebook 100.0%