PT4Exploits

This repo is made to replicate our paper "Prompt Learning for Developing Software Exploits".

Dataset

The Python and Assembly datasets after Parser: \py-IP and \data_shell_gen_IP.

Training and Generation

Run the command below:

python shell_prompt_t5.py --save_init --do_train --do_eval --do_test --train_filename data_shell_gen_IP\\decoder-train.json.seq2seq --dev_filename data_shell_gen_IP\\decoder-dev.json.seq2seq --test_filename data_shell_gen_IP\\decoder-test.json.seq2seq --model_name Salesforce/codet5-base --loss_filename loss/demo.csv --num_train_epochs 20 --visible_gpu <GPU> --max_source_length 256 --max_target_length 128 --train_batch_size 4 --eval_batch_size 4 --log_name=./log/demo.log --output_dir=demo_output

Evaluation

1. Automatic Evaluation

bash evaluate.sh demo_output data_shell_gen_IP

In this case, demo_output is the [eval_data], data_shell_gen_IP is the [data_dir].

2. Human Evaluation

It can be find in \human_eval.

Baseline Models

All the generated results are in \generated_samples.

Requirements

python 3.7
pytorch 1.10.0
openprompt 0.1.1
rouge 0.3.0
nlg-eval 2.3
nltk 3.7

Alex-Ruan / PT4Exploits