zzulb / RFGuess

基于机器学习的社工字典生成工具. A Machine Learning Approach for Password Guessing. The reproduction of (https://www.usenix.org/conference/usenixsecurity23/presentation/wang-ding-password-guessing)

Home Page:https://www.usenix.org/conference/usenixsecurity23/presentation/wang-ding-password-guessing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RFGuess(Random Forest Password Guessing model)

Overview

This repository contains the reproduction for the paper Password Guessing Using Random Forest. The author proposes a set of new methods to translate PII(Personal Identifiable Information) data into structures that perform quite well in classical machine learning models. I have implemented the main concept of the paper and programmed an easy-to-use tool for training models, generating patterns, conducting guesses and evaluating accuracy. This repo contributes:

  • A GUI program exclusively for the PII-based targeted password guessing scenario
  • A pre-trained model

Table of contents

Features

  • PII-based targeted password guessing
  • A pre-trained model(get here) ready to use which is trained on a dataset with 11w data
  • Generate password patterns based on PII dataset
  • Conduct password guesses for given personal information
  • Support for training specified model for self-defined datasets
  • Support for evaluating the accuracy of generated guesses

Prerequisites

Usage

Main window

Run the executable file and you will see the panel as below: 1

There are three main modules in the user interface: Guess-Generator, Pattern-Generator and Model-Trainner.

Generate pattern(Pattern-Generator)

First you should get a trained model(whether you train by yourself in model-trainner or use the pre-trained model from rfguess.clf). Then set a limit on the number of patterns to be generated and start generating.

  1. Load model(.clf) 2

  2. Assign output path and limit 3

Generate password dictionary(Guess-Generator)

This module requires a pattern file(see Appendix for more detail) and PII data of the target user. You can load the pattern file generated by Pattern-Generator or use the default pattern file.

  1. Load pattern file 4

  2. Fill in PII data Input the personal data of the target user or load data from json file(format) 5

6

  1. Generate password dictionary 7

Train your own model

The model training process of Machine-Learning is pretty more laborious than that of Deep-Learning. The algorithms in this program need to use mysql database to store intermediate data structures while processing the original dataset. Fortunately, you just need to have a normal running mysql server and just provide a database url to connect to. All the data structures are configured automatically.

  1. Connect to your database and import database structure

Connect to database URL: 8

Import sql file(get here): 9

  1. Load your PII dataset(.txt)

The PII dataset should in csv format and comply with the principles below:

  • the first line presents field names
    • field name should fall into ['account', 'name', 'phone', 'idcard', 'email', 'password'], case-insensitive
    • you can include any combination of the allowed fields but name and password are mandatory
  • each line contains one PII data
  • each line should have several fields and separated by comma
  • blank characters will be ignored

A legal dataset is presented like:

name, email, password, phone
张三, 350777@aa.com , zhangsan, 111122222
John, 3333@bb.com, 3333, 44444
Jason Harris, aaaa@aa.com, 5555, 5555

You can specify the character set of the target dataset by Charset edit box. image

Push Load PII Data button and wait. Your dataset will be consumed and stored in database after some procession. 10

  1. Analyze and process dataset

This step will analyze the PII dataset to some intermediate data. 11

  1. Train model

You will train a classifier model and dump into a .clf file. 12

  1. Evaluate accuracy

To evaluate the accuracy of a model, this step uses 50% of your dataset as train-set and other 50% as test-set, generates a password dictionary for each PII data and checks whether the correct password falls into the dictionary. 13

  1. Restore the status of last run Use "Update Status" button to load the progress of the last run and check the status of each phase. 14

Advanced Configuration

See more detailed configuration at Config.py.

Algorithm configuration

The pii_order parameter denotes the order of the Markov model.

pii_order = 6 

You can control the limit of guesses by the two following thresholds, which are calculated according to the probability of the growing pattern. A pattern is adopted only if its probability is greater than the threshold. So the larger is the threshold, the lesser is the number of guesses, vice verse. It is notable that you should not set the threshold excessively small(lesser than 1e-11) to avoid overwhelming by useless patterns.

general_generator_threshold = 1.2e-8

Database configuration

You can config the table names of database as you like:

class TableNames:
    PII = "PII"
    pwrepresentation = "pwrepresentation"
    representation_frequency = "representation_frequency"
    pwrepresentation_frequency = "pwrepresentation_frequency"
    pwrepresentation_unique = "pwrepresentation_unique"
    pwrepresentation_general = f"{pwrepresentation}_general"
    representation_frequency_base_general = f"representation_frequency_base_general"
    representation_frequency_general = f"{representation_frequency}_general"
    pwrepresentation_frequency_general = f"{pwrepresentation_frequency}_general"
    pwrepresentation_unique_general = f"{pwrepresentation_unique}_general"

Classifier configuration

Tune the parameters of random forest by the following config:

class RFParams:
    n_estimators = 30
    criterion = 'gini'
    min_samples_leaf = 10
    max_features = 0.8

Build from source

This project is written by Python3.11. You can install dependencies by using pip:

pip install -r requirements.txt

And run the following command to launch the main window:

py main.py

License

This code is released under an MIT License. You are free to use, modify, distribute, or sell it under those terms.

Contact

Project Link: https://github.com/PadishahIII/RFGuess

Acknowledgements

Appendix

Pattern format

Tag Description
N1 FullName
N2 Abbreviate of name
N3 Family name
N4 Given name
N5 First character of given name append family name
N6 First character of family name append given name
N7 Family name capitalized
N8 First character of family name
N9 Abbr of given name
B1 Birthday in YYYYMMDD
B2 MMDDYYYY
B3 DDMMYYYY
B4 MMDD
B5 YYYY
B6 YYYYMM
B7 MMYYYY
B8 YYMMDD
B9 MMDDYY
B10 DDMMYY
A1 Account
A2 Letter segment of account
A3 Digit segment of account
E1 Email prefix
E2 Letter segment of email
E3 Digit segment of email
E4 Email site like qq, 163
P1 Phone number
P2 First three digits of phone number
P3 Last four digits of phone number
I1 Id card number
I2 First three digits of idCard
I3 First six digits of idCard

About

基于机器学习的社工字典生成工具. A Machine Learning Approach for Password Guessing. The reproduction of (https://www.usenix.org/conference/usenixsecurity23/presentation/wang-ding-password-guessing)

https://www.usenix.org/conference/usenixsecurity23/presentation/wang-ding-password-guessing

License:MIT License


Languages

Language:HTML 75.1%Language:TeX 18.0%Language:Python 6.9%