Jivansha / Malicious-URL-classification---ML

ML applied to detect malicious web URLs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Malicious-URL-classification---ML

ML applied to detect malicious web URLs

Problem Statement

Devise a system which can predict whether an URL or web address is a potential risk to system or not using Machine Learning.

Aim and Objective

  1. Gather appropriate dataset for building classification model
  2. Process and label data appropriately, remove redundant data
  3. Extract different features to be used in Machine Learning model - Lexical, Domain-based, DNS, Host based and Rank- based features
  4. Build ML model for obtained feature vector for different classification algorithms - k-Nearest Neighbour and Random Forest classifier
  5. Visualise features and model characteristics
  6. Study results obtained and conclude

Background

The World Wide Web becomes victim of Web attacks like spamming, phishing and malware. When the innocent user unknowingly visits the URL, it becomes the victim of the attacks causing serious impact on business, banking and social networks.

Discriminative features to obtain a good feature representation of the URL include: Lexical features, Host-based features, Content based features, DNS features, Popularity features etc.

This project aims to develop model that can classify URLs as a potential hazard to system security by use of different features.

About

ML applied to detect malicious web URLs


Languages

Language:Jupyter Notebook 100.0%