Sahi Hai - A Chrome Extension to detect Malicious Websites

Introduction

Sahi Hai was made by keeping in the mind the regular internet user who has to go through many websites during his time on the internet and may fall under a trap of a malicious website who might want the user's information or want to introduce malware in their system. Our chrome extension helps the user to check a particular website whether it is safe for browsing or not we have used a pretrained ml model to do so.

Sahi Hai - A Chrome Extension to detect Malicious Websites
Introduction
How it Works ?
- What Problem it Solves ?
Tech Stack
Usage
Acknowledgments

How it Works ?

The ML model extracts the following features from a url :

Feattures Used
Having IP address	URL Length	URL Shortening service	Having @ symbol
Having double slash	Having dash symbol(Prefix Suffix)	Having multiple subdomains	SSL Final State
URL of Anchor	Links in tags	SFH - Server from Handler.	Submitting to email
Abnormal URL	IFrame	Age of Domain	DNS Record
Web Traffic - using data.alexa.com	Google Index		Statistical Reports

We have iterated multiple times during training phase :

What Problem it Solves ?

Every other website in today's day and age on the internet wants to collect data of its users by tricking them into giving away their credentials for fraud or many such vindictive acts. Naive users using a browser have no idea about the backend of the page. The users might be tricked into giving away their credentials or downloading malicious data.

We have created an extension for Chrome that will act as middleware between the users and the malicious websites and relieve users of giving away to such websites. Our project was made by keeping in the mind the regular internet user who has to go through many websites during his time on the internet and may fall under a trap of a malicious website who might want the user's information or want to introduce malware in their system. Our chrome extension helps the user to check a particular website whether it is safe for browsing or not

Tech Stack

HTML - The front-end development language used for creating extension.
CSS - The front-end development language used for creating extension.
Python - The Programing Language used to parse features from a website and for training/testing of the ML model.
JavaScript - The scripting language used for creating the extension and sending requests to the served Ml model.
Php - The scripting language used for serving the Ml model .
Beautiful Soup - The library used to scrape websites from a url.
Googlesearch - The library for performing google search's during feature extraction.
whois - The package for retrieving WHOIS information of domains during feature extraction.
scikit-learn - The library used for training ML models.

Usage

Directory Structure

.
|-- LICENSE
|-- README.md
|-- extension
|   |-- icon.png
|   |-- manifest.json
|   |-- popup.html
|   |-- popup.js
|   `-- style.css
|-- images
|   `-- working.gif
|-- models
|   |-- mlp_model.pkl
|   `-- random_forest.pkl
|-- requirements.txt
|-- run.sh
|-- test
|   |-- __pycache__
|   |   |-- features_extraction.cpython-39.pyc
|   |   `-- patterns.cpython-39.pyc
|   |-- features_extraction.py
|   |-- features_extraction.pyc
|   |-- index.php
|   |-- markup.txt
|   |-- patterns.py
|   |-- patterns.pyc
|   `-- test.py
`-- train
    |-- data
    |   `-- web_data.arff
    |-- train_mlp.py
    `-- train_rf.py

Backend - Ml Model

Clone The Repo

Fire Up Terminal and Hit

pip install -r requirements.txt 
./run.sh

Extension

Go to chrome Settings using three dots on the top right corner
select Extensions.
Enable developer mode
click on Load Unpacked and select the extensions folder.

Acknowledgments

A very heartful thanks to the authors and owners of the following articles which propelled us to make Sahi Hai.

And also lots of gratitude for the whole team of "HackNITR 2021" for providing us the perfect platform to showcase our idea.

About

Chrome Extension to detect Malicious Websites

MIT License

Languages

Language:Python 81.0%Language:CSS 6.6%Language:JavaScript 4.8%Language:HTML 4.5%Language:Dockerfile 2.0%Language:PHP 0.9%Language:Shell 0.1%