tuan1101 / Domain-Parking-Sensors

Extracts features from web pages to determine whether the domain is parked

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Domain Parking Sensors

Introduction

These scripts can be used to extract features from web pages to build a classifier that can detect parked domains. The code is based on the research paper "Parking Sensors: Analyzing and Detecting Parked Domains" [PDF] by Thomas Vissers, Nick Nikiforakis and Wouter Joosen. If you use, extend or build upon this project, we kindly ask you to cite the original NDSS paper. The relevant BibTeX is provided below.

@inproceedings{vissers2015parking,
title={Parking Sensors: Analyzing and Detecting Parked Domains},
author={Vissers, Thomas and Joosen, Wouter and Nikiforakis, Nick},
booktitle={Proceedings of the ISOC Network and Distributed System Security Symposium (NDSS’15)},
year={2015}
}

Usage

  1. Retrieve the necessary data from a sample of domains (HAR, HTML, Redirections, frames, ...)

$ casperjs --folder=[output folder] --domain=[somedomain.com] retrieve_page_data.js

  1. Extract 20+ features from this data (e.g. link location lengths, amount of text, third-party request ratio, ...)

$ python feature_extractor.py [folder] [class label]

Example scenario
$ casperjs --folder=benign_samples --domain=github.com retrieve_page_data.js
$ casperjs --folder=benign_samples --domain=stackoverflow.com retrieve_page_data.js
...
$ casperjs --folder=parked_samples --domain=giyhub.com retrieve_page_data.js 
$ casperjs --folder=parked_samples --domain=stackovreflow.com retrieve_page_data.js 
...
$ python feature_extractor.py benign_samples benign
$ python feature_extractor.py parked_samples parked

Requirements

Troubleshooting

Some versions of PhantomJS use SSLv3 by default. This might cause issues with SSL sites since the POODLE vulnerability was disclosed. To resolve this issue, you can add the following parameter when executing CasperJS:

--ssl-protocol=any 

More information: http://stackoverflow.com/questions/26415188/casperjs-phantomjs-doesnt-load-https-page

About

Extracts features from web pages to determine whether the domain is parked


Languages

Language:Python 70.6%Language:JavaScript 29.4%