Sahi Hai was made by keeping in the mind the regular internet user who has to go through many websites during his time on the internet and may fall under a trap of a malicious website who might want the user's information or want to introduce malware in their system. Our chrome extension helps the user to check a particular website whether it is safe for browsing or not we have used a pretrained ml model to do so.
- Sahi Hai - A Chrome Extension to detect Malicious Websites
- Introduction
- How it Works ?
- Tech Stack
- Usage
- Acknowledgments
The ML model extracts the following features from a url :
Feattures Used | |||
---|---|---|---|
Having IP address | URL Length | URL Shortening service | Having @ symbol |
Having double slash | Having dash symbol(Prefix Suffix) | Having multiple subdomains | SSL Final State |
URL of Anchor | Links in tags | SFH - Server from Handler. | Submitting to email |
Abnormal URL | IFrame | Age of Domain | DNS Record |
Web Traffic - using data.alexa.com | Google Index | Statistical Reports |
We have iterated multiple times during training phase :
Every other website in today's day and age on the internet wants to collect data of its users by tricking them into giving away their credentials for fraud or many such vindictive acts. Naive users using a browser have no idea about the backend of the page. The users might be tricked into giving away their credentials or downloading malicious data.
We have created an extension for Chrome that will act as middleware between the users and the malicious websites and relieve users of giving away to such websites. Our project was made by keeping in the mind the regular internet user who has to go through many websites during his time on the internet and may fall under a trap of a malicious website who might want the user's information or want to introduce malware in their system. Our chrome extension helps the user to check a particular website whether it is safe for browsing or not
-
HTML - The front-end development language used for creating extension.
-
CSS - The front-end development language used for creating extension.
-
Python - The Programing Language used to parse features from a website and for training/testing of the ML model.
-
JavaScript - The scripting language used for creating the extension and sending requests to the served Ml model.
-
Php - The scripting language used for serving the Ml model .
-
Beautiful Soup - The library used to scrape websites from a url.
-
Googlesearch - The library for performing google search's during feature extraction.
-
whois - The package for retrieving WHOIS information of domains during feature extraction.
-
scikit-learn - The library used for training ML models.
.
|-- LICENSE
|-- README.md
|-- extension
| |-- icon.png
| |-- manifest.json
| |-- popup.html
| |-- popup.js
| `-- style.css
|-- images
| `-- working.gif
|-- models
| |-- mlp_model.pkl
| `-- random_forest.pkl
|-- requirements.txt
|-- run.sh
|-- test
| |-- __pycache__
| | |-- features_extraction.cpython-39.pyc
| | `-- patterns.cpython-39.pyc
| |-- features_extraction.py
| |-- features_extraction.pyc
| |-- index.php
| |-- markup.txt
| |-- patterns.py
| |-- patterns.pyc
| `-- test.py
`-- train
|-- data
| `-- web_data.arff
|-- train_mlp.py
`-- train_rf.py
-
Clone The Repo
-
Fire Up Terminal and Hit
pip install -r requirements.txt ./run.sh
-
Go to chrome Settings using three dots on the top right corner
-
select Extensions.
-
Enable developer mode
-
click on Load Unpacked and select the extensions folder.
A very heartful thanks to the authors and owners of the following articles which propelled us to make Sahi Hai.
- Malicious URL Detection based on Machine Learning
- Detecting malicious URLs using machine learning techniques
- Malicious URL Detection using Machine Learning: A Survey
And also lots of gratitude for the whole team of "HackNITR 2021" for providing us the perfect platform to showcase our idea.