cristiandiiorio / Sahi_Hai

Chrome Extension to detect Malicious Websites

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sahi Hai - A Chrome Extension to detect Malicious Websites

Introduction

Sahi Hai was made by keeping in the mind the regular internet user who has to go through many websites during his time on the internet and may fall under a trap of a malicious website who might want the user's information or want to introduce malware in their system. Our chrome extension helps the user to check a particular website whether it is safe for browsing or not we have used a pretrained ml model to do so.

How it Works ?

The ML model extracts the following features from a url :

Feattures Used
Having IP address URL Length URL Shortening service Having @ symbol
Having double slash Having dash symbol(Prefix Suffix) Having multiple subdomains SSL Final State
URL of Anchor Links in tags SFH - Server from Handler. Submitting to email
Abnormal URL IFrame Age of Domain DNS Record
Web Traffic - using data.alexa.com Google Index Statistical Reports

We have iterated multiple times during training phase :


What Problem it Solves ?

Every other website in today's day and age on the internet wants to collect data of its users by tricking them into giving away their credentials for fraud or many such vindictive acts. Naive users using a browser have no idea about the backend of the page. The users might be tricked into giving away their credentials or downloading malicious data.

We have created an extension for Chrome that will act as middleware between the users and the malicious websites and relieve users of giving away to such websites. Our project was made by keeping in the mind the regular internet user who has to go through many websites during his time on the internet and may fall under a trap of a malicious website who might want the user's information or want to introduce malware in their system. Our chrome extension helps the user to check a particular website whether it is safe for browsing or not

Tech Stack

  • HTML - The front-end development language used for creating extension.

  • CSS - The front-end development language used for creating extension.

  • Python - The Programing Language used to parse features from a website and for training/testing of the ML model.

  • JavaScript - The scripting language used for creating the extension and sending requests to the served Ml model.

  • Php - The scripting language used for serving the Ml model .

  • Beautiful Soup - The library used to scrape websites from a url.

  • Googlesearch - The library for performing google search's during feature extraction.

  • whois - The package for retrieving WHOIS information of domains during feature extraction.

  • scikit-learn - The library used for training ML models.


Usage

Directory Structure

.
|-- LICENSE
|-- README.md
|-- extension
|   |-- icon.png
|   |-- manifest.json
|   |-- popup.html
|   |-- popup.js
|   `-- style.css
|-- images
|   `-- working.gif
|-- models
|   |-- mlp_model.pkl
|   `-- random_forest.pkl
|-- requirements.txt
|-- run.sh
|-- test
|   |-- __pycache__
|   |   |-- features_extraction.cpython-39.pyc
|   |   `-- patterns.cpython-39.pyc
|   |-- features_extraction.py
|   |-- features_extraction.pyc
|   |-- index.php
|   |-- markup.txt
|   |-- patterns.py
|   |-- patterns.pyc
|   `-- test.py
`-- train
    |-- data
    |   `-- web_data.arff
    |-- train_mlp.py
    `-- train_rf.py

Backend - Ml Model

  1. Clone The Repo

  2. Fire Up Terminal and Hit

    pip install -r requirements.txt 
    ./run.sh
    

Extension

  1. Go to chrome Settings using three dots on the top right corner

  2. select Extensions.

  3. Enable developer mode

  4. click on Load Unpacked and select the extensions folder.


Acknowledgments

A very heartful thanks to the authors and owners of the following articles which propelled us to make Sahi Hai.

And also lots of gratitude for the whole team of "HackNITR 2021" for providing us the perfect platform to showcase our idea.

About

Chrome Extension to detect Malicious Websites

License:MIT License


Languages

Language:Python 81.0%Language:CSS 6.6%Language:JavaScript 4.8%Language:HTML 4.5%Language:Dockerfile 2.0%Language:PHP 0.9%Language:Shell 0.1%