lindsey98 / PhishIntention

PhishIntention: Phishing detection through webpage intention

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PhishIntention

Dialogues

PaperWebsiteVideoCitation

PhishIntention

  • This is the official implementation of "Inferring Phishing Intention via Webpage Appearance and Dynamics: A Deep Vision-Based Approach"USENIX'22 link to paper, link to our website

  • Existing reference-based phishing detectors:

    • ❌ Subject to false positive because they only capture brand intention
  • The contributions of our paper:

    • ✅ We propose a referenced-based phishing detection system that captures both brand intention and credential-taking intention. To the best of our knowledge, this is the first work that analyzes both brand intention and credential-taking intentions in a systematic way for phishing detection.
    • ✅ We set up a phishing monitoring system. It reports phishing webpages per day with the highest precision in comparison to state-of-the-art phishing detection solutions.

Framework

Input: a screenshot, Output: Phish/Benign, Phishing target

  • Step 1: Enter Abstract Layout detector, get predicted elements

  • Step 2: Enter Siamese Logo Comparison

    • If Siamese report no target, Return Benign, None
    • Else Siamese report a target, Enter step 3 CRP classifier
  • Step 3: CRP classifier

    • If CRP classifier reports its a CRP page, go to step 5 Return
    • ElIf not a CRP page and havent execute CRP Locator before, go to step 4: CRP Locator
    • Else not a CRP page but have done CRP Locator before, Return Benign, None
  • Step 4: CRP Locator

    • Find login/signup links and click, if reach a CRP page at the end, go back to step 1 Abstract Layout detector with an updated URL and screenshot
    • Else cannot reach a CRP page, Return Benign, None
  • Step 5:

    • If reach a CRP + Siamese report target: Return Phish, Phishing target
    • Else Return Benign, None

Project structure

|_ configs: Configuration files for the object detection models and the gloal configurations
|_ modules: Inference code for layout detector, CRP classifier, CRP locator, and OCR-aided siamese model
|_ models: the model weights and reference list
|_ ocr_lib: external code for the OCR encoder
|_ utils
|_ configs.py: load configuration files
|_ phishintention.py: main script

Instructions

Requirements:

  1. Create a local clone of PhishIntention
git clone https://github.com/lindsey98/PhishIntention.git
cd PhishIntention
  1. Setup. In this step, we would be installing the core dependencies of PhishIntention such as pytorch, and detectron2. In addition, we would also download the model checkpoints and brand reference list. This step may take some time.
chmod +x setup.sh
export ENV_NAME="phishintention"
./setup.sh
conda activate phishintention
  1. Run
python phishintention.py --folder <folder you want to test e.g. datasets/test_sites> --output_txt <where you want to save the results e.g. test.txt>

The testing folder should be in the structure of:

test_site_1
|__ info.txt (Write the URL)
|__ shot.png (Save the screenshot)
|__ html.txt (HTML source code, optional)
test_site_2
|__ info.txt (Write the URL)
|__ shot.png (Save the screenshot)
|__ html.txt (HTML source code, optional)
......

Miscellaneous

  • In our paper, we also implement several phishing detection and identification baselines, see here

Citation

Please consider citing our work :)

@inproceedings{liu2022inferring,
  title={Inferring Phishing Intention via Webpage Appearance and Dynamics: A Deep Vision Based Approach},
  author={Liu, Ruofan and Lin, Yun and Yang, Xianglin and Ng, Siang Hwee and Divakaran, Dinil Mon and Dong, Jin Song},
  booktitle={30th $\{$USENIX$\}$ Security Symposium ($\{$USENIX$\}$ Security 21)},
  year={2022}
}

If you have any issues running our code, you can raise an issue or send an email to liu.ruofan16@u.nus.edu, lin_yun@sjtu.edu.cn, dcsdjs@nus.edu.sg

About

PhishIntention: Phishing detection through webpage intention

License:MIT License


Languages

Language:Python 97.1%Language:Shell 2.9%