PhishIntention

This is the official implementation of "Inferring Phishing Intention via Webpage Appearance and Dynamics: A Deep Vision-Based Approach"USENIX'22 link to paper, link to our website
Existing reference-based phishing detectors:
- ❌ Subject to false positive because they only capture brand intention
The contributions of our paper:
- ✅ We propose a referenced-based phishing detection system that captures both brand intention and credential-taking intention. To the best of our knowledge, this is the first work that analyzes both brand intention and credential-taking intentions in a systematic way for phishing detection.
- ✅ We set up a phishing monitoring system. It reports phishing webpages per day with the highest precision in comparison to state-of-the-art phishing detection solutions.

Framework

Input: a screenshot, Output: Phish/Benign, Phishing target

Step 1: Enter Abstract Layout detector, get predicted elements
Step 2: Enter Siamese Logo Comparison
- If Siamese report no target, Return Benign, None
- Else Siamese report a target, Enter step 3 CRP classifier
Step 3: CRP classifier
- If CRP classifier reports its a CRP page, go to step 5 Return
- ElIf not a CRP page and havent execute CRP Locator before, go to step 4: CRP Locator
- Else not a CRP page but have done CRP Locator before, Return Benign, None
Step 4: CRP Locator
- Find login/signup links and click, if reach a CRP page at the end, go back to step 1 Abstract Layout detector with an updated URL and screenshot
- Else cannot reach a CRP page, Return Benign, None
Step 5:
- If reach a CRP + Siamese report target: Return Phish, Phishing target
- Else Return Benign, None

Project structure

|_ configs: Configuration files for the object detection models and the gloal configurations
|_ modules: Inference code for layout detector, CRP classifier, CRP locator, and OCR-aided siamese model
|_ models: the model weights and reference list
|_ ocr_lib: external code for the OCR encoder
|_ utils
|_ configs.py: load configuration files
|_ phishintention.py: main script

Instructions

Requirements:

Anaconda installed, please refer to the official installation guide: https://docs.anaconda.com/free/anaconda/install/index.html
CUDA >= 11

Create a local clone of PhishIntention

git clone https://github.com/lindsey98/PhishIntention.git
cd PhishIntention

Setup. In this step, we would be installing the core dependencies of PhishIntention such as pytorch, and detectron2. In addition, we would also download the model checkpoints and brand reference list. This step may take some time.

chmod +x setup.sh
export ENV_NAME="phishintention"
./setup.sh

conda activate phishintention

python phishintention.py --folder <folder you want to test e.g. datasets/test_sites> --output_txt <where you want to save the results e.g. test.txt>

The testing folder should be in the structure of:

test_site_1
|__ info.txt (Write the URL)
|__ shot.png (Save the screenshot)
|__ html.txt (HTML source code, optional)
test_site_2
|__ info.txt (Write the URL)
|__ shot.png (Save the screenshot)
|__ html.txt (HTML source code, optional)
......

Miscellaneous

In our paper, we also implement several phishing detection and identification baselines, see here

Citation

Please consider citing our work :)

@inproceedings{liu2022inferring,
  title={Inferring Phishing Intention via Webpage Appearance and Dynamics: A Deep Vision Based Approach},
  author={Liu, Ruofan and Lin, Yun and Yang, Xianglin and Ng, Siang Hwee and Divakaran, Dinil Mon and Dong, Jin Song},
  booktitle={30th $\{$USENIX$\}$ Security Symposium ($\{$USENIX$\}$ Security 21)},
  year={2022}
}

If you have any issues running our code, you can raise an issue or send an email to liu.ruofan16@u.nus.edu, lin_yun@sjtu.edu.cn, dcsdjs@nus.edu.sg

lindsey98 / PhishIntention

PhishIntention

PhishIntention

Framework

Project structure

Instructions

Miscellaneous

Citation

About

Languages