ProFuzzBench - A Benchmark for Stateful Protocol Fuzzing

This version of ProFuzzBench has been enhanced to support running Nyx-Net. The original README file is README.orig.md.

Folder structure

protocol-fuzzing-benchmark
├── subjects: this folder contains all protocols included in this benchmark and
│   │         each protocol may have more than one target server
│   └── RTSP
│   └── FTP
│   │   └── LightFTP
│   │       └── Dockerfile: subject-specific Dockerfile
│   │       └── run.sh: main script to run experiment inside a container
│   │       └── cov_script.sh: script to do code coverage analysis for AFL-based fuzzers
│   │       └── cov_script_nyx.sh: script to do code coverage analysis for nyx seeds
│   │       └── crashes_stats.sh: script to execute the ASAN-compiled target over a set of inputs
│   │       └── other files (e.g., patches, other subject-specific scripts)
│   └── ...
└── scripts: this folder contains all scripts to run experiments and collect results
│   └── execution: only for AFL-based
│   │   └── profuzzbench_exec_common.sh: main script to spawn containers and run AFL-based experiments on them
│   └── analysis: old ProFuzzBench scripts to gather and convert coverage
│   |   └── profuzzbench_generate_csv.sh: this script collect code coverage results from different runs
│   |   └── profuzzbench_plot.py: sample script for plotting
│   └── nyx-eval: scripts to spawn experiments of Nyx-Net, gather coverage, etc.
|   |   └── common.bash: common functions and utilities used by other scripts
|   |   └── start.sh: main script to start new Nyx-Net experiments
|   |   └── reproducible.sh: utility to convert test cases generated by Nyx-Net
|   |   └── coverage.sh: script to gather coverage measurements after a fuzzer run
|   |   └── convert_coverage.sh: aggregates runs coverage data to a single CSV file
|   |   └── crashes.sh: starts container that execute the crashes_stats.sh script
|   |   └── gather_execs.sh: script to extract the number of fuzz-cases per second
|   └── buildall.sh: utility to build docker images for all targets (requires GNU Parallel)
|   └── runqueue.py: runs a queue of experiments in parallel
|   └── README.md: additional information about these scripts
└── PFB.jl: analysis and plotting functions in Julia
└── freecores: (hackish) utility to get the list of available cores for pinning
└── README.orig.md: original README from ProFuzzBench
└── README.md

Tutorial - Fuzzing LightFTP server

Set up environment

git clone https://github.com/RUB-SysSec/nyx-net-profuzzbench.git
cd profuzzbench
export PFBENCH=$(pwd)
export PATH=$PATH:$PFBENCH/scripts/execution:$PFBENCH/scripts/analysis

Most scripts pin individual processes to specific cores. The default list of available cores is specified in scripts/nyx-eval/common.bash. You should update it to reflect your system.

Build a docker image

The following commands create a docker image tagged lightftp. The image should have everything available for fuzzing and code coverage collection.

cd $PFBENCH
cd subjects/FTP/LightFTP
docker build . -t pfb-lightftp

N.B.: this framework assumes that docker images be called pfb-$target where $target is the lowercase name of the subject.

The script scripts/buildall.sh can be used to build all Docker images in parallel; how many parallel builds are allowed is defined in the script.

Running AFL-based fuzzers

Follow the steps below to run and collect experimental results for LightFTP, which is a lightweight File Transfer Protocol (FTP) server. Similar steps should be followed to run experiments on other subjects.

Start fuzzing runs

Run profuzzbench_exec_common.sh script to start an experiment for AFL-based fuzzers and compute coverage over time raw data. The script accepts the following flags:

-h: prints usage information and exits
-c core: do a single run on the given core
-i idx: index of the single run to do (i.e. determines the name of the output archive)
-r runs: number of parallel runs
-t target: name of the target to run
-d outdir: directory where to place the output archive (it'll be $outdir/out-$target-$fuzzer-$n.tar.gz where $n is the run number starting from zero)
-f fuzzer: one of aflnet, aflnwe or aflpp, with optional -no-seeds suffix
-O opts: additional options to pass to the fuzzer (quote as a single string)
-T secs: timeout for each run
-S skips: skip count to "sample" while computing coverage

The script has two modes of operation, determined by the flags -c, -i and -r. If the first two are given the script will execute a single run, pinned to the core specified by -c and will store the results in $outdir/out-$target-$fuzzer-$i.tar.gz where $i is given by flag -i. If the script is given the flag -r it will run multiple runs and store the resulting archives with indices starting from zero; moreover it will use the freecores utility to find a free core to pin the docker container to.

The following commands run 4 instances of AFLNet to fuzz LightFTP in 60 minutes.

cd $PFBENCH
mkdir results-lightftp
profuzzbench_exec_common.sh -t lightftp -r 4 -d results-lightftp -f aflnet \
                            -O "-P FTP -D 10000 -q 3 -s 3 -E -K -c ./ftpclean.sh" \
                            -T 3600 -S 5

Collect the results to CSV

All results (in tar files) should be stored in the folder created in the previous step (results-lightftp). Specifically, these tar files are the compressed version of output folders produced by all fuzzing instances. If the fuzzer is afl based (e.g. AFLNet, AFLnwe) each folder should contain sub-folders like crashes, hangs, queue and so on.

Use scripts/nyx-eval/convert_coverage.sh to collect results in terms of code coverage over time. Measurements from all runs are aggregated into a single CSV file. The script takes the following flags:

-h: prints usage information and exits
-r runs: number of runs to extract coverage from
-d outdir: output directory with runs archives
-t target: name of the target
-f fuzzer: name of the fuzzer
-p snaps: snapshot placement policy
-o outcsv: output filename where CSV data is placed
-a: append to an existing CSV output file
-e: extract runs data archives first; overwrites previous separate nyx-eval/coverage.sh run

For this script most of the flags are just used to identify the archive names:

Nyx-Net folders for runs without snapshotting (i.e. -p none) should look like $outdir/out-$target-$run
Nyx-Net folders for runs that used snapshotting should look like $outdir/out-$target-$snap-$run
For all other fuzzers: $outdir/out-$target-$fuzzer-$run.tar.gz

The following command collects the code coverage results produced by AFLNet and saves them to results.csv.

$PFBENCH/scripts/nyx-eval/convert_coverage.sh \
    -t lightftp -r 4 -f aflnet \
    -d $PFBENCH/results-lightftp -o results.csv -e

Note: The above command will delete folders results-lightftp/out-lightftp-aflnet-00{0..3} and extract archives results-lightftp/out-lightftp-aflnet-00{0..3}.tar.gz in their place (-e flag).

The results.csv file should look similar to text below. The file has six columns showing the timestamp, subject program, fuzzer name, run index, coverage type and its value. The file contains both line coverage and branch coverage over time information. Each coverage type comes with two values, in percentage (_per) and in absolute number (_abs).

time,subject,fuzzer,run,cov_type,cov
1600905795,lightftp,aflnet,1,l_per,25.9
1600905795,lightftp,aflnet,1,l_abs,292
1600905795,lightftp,aflnet,1,b_per,13.3
1600905795,lightftp,aflnet,1,b_abs,108
1600905795,lightftp,aflnet,1,l_per,25.9
1600905795,lightftp,aflnet,1,l_abs,292

Running Nyx-Net

Set up environment and fuzzer

Follow the steps in the Nyx-Net repository to set it up, then issue:

export NYX_NET_FUZZER_DIR=/path/to/nyx-net/fuzzer/rust_fuzzer
export NYX_NET_FUZZER_DEBUG_DIR=/path/to/nyx-net/fuzzer/rust_fuzzer_debug
export NYX_NET_TARGETS_DIR=/path/to/nyx-net/targets/packed_targets

Run fuzzer

The script scripts/nyx-eval/start.sh is analogous to profuzzbench_exec_common.sh but specific to running Nyx-Net; moreover it only runs the fuzzer but does not collect raw coverage data. It accepts the following flags:

-h: prints usage information and exits
-c core: do a single run on the given core
-i idx: index of the single run to do (i.e. determines the name of the output folder)
-r runs: number of parallel runs
-t target: name of the target to run
-d outdir: directory where to place the output (it'll be $outdir/out-$target-$snap-$n where $n is the run number starting from zero)
-p snap: snapshot placement policy, one of none, balanced or aggressive
-T secs: timeout for each run

The meaning of the flags is the same as profuzzbench_exec_common.sh.

Note: the path to the Nyx-Net fuzzer is given by the environment variable NYX_NET_FUZZER_DIR while the path to the target's spec is given by $NYX_NET_TARGETS_DIR/nyx_$target.

To run 4 instances of Nyx-Net using an aggressive snapshot placement policy use the following command:

$PFBENCH/scripts/nyx-eval/start.sh -r 4 -t lightftp -p aggressive -d /tmp -T 3600

This will produce fuzzing outputs in /tmp/out-lightftp-aggressive-00{0..3} and use the spec in folder /tmp/nyx_lightftp.

Getting replayable test cases

Before computing and collecting coverage into a CSV we need to convert Nyx-Net seeds in a replayable format. For this you can use scripts/nyx-eval/reproducible.sh. Most of the flags are only used to determine the folders to work on:

-h: prints usage information and exits
-c core: do a single run on the given core
-i idx: index of the single run to do (i.e. determines the name of the output folder)
-r runs: number of parallel runs
-t target: name of the target to run
-d outdir: directory with fuzzing output (it'll be $outdir/out-$target-$snap-$n where $n is the run number starting from zero)
-p snap: one of none, balanced or aggressive

Also, it will use environment variable NYX_NET_FUZZER_DEBUG_DIR.

To make the reproducible test cases for our example use:

$PFBENCH/scripts/nyx-eval/reproducible.sh -r 4 -t lightftp -p aggressive -d /tmp

This will convert corpus test cases from fuzzing runs (i.e. /tmp/out-lightftp-aggressive-00{0..3}/corpus) and store replayable test cases into /tmp/out-lightftp-aggressive-00{0..3}/reproducible.

Computing coverage

The scripts/nyx-eval/coverage.sh script can be used to re-execute all replayable test cases (also for AFL-based fuzzers) and compute coverage information; everything is done inside a docker container. The flags are similar to other scripts:

-h: prints usage information and exits
-c core: do a single run on the given core
-i idx: index of the single run to do (i.e. determines the name of the output archive/folder)
-r runs: number of parallel runs
-t target: name of the target to run
-f fuzzer: can be nyx, aflnet, aflnwe, etc.
-d outdir: directory with fuzzing output
-p snap: one of none, balanced or aggressive
-s skip: skip count for "sampling" inputs to compute coverage

Additionally, the script accepts an environment variable for the replayer script (usually located in the Nyx-Net repository at packer/packer/nyx_net_payload_executor.py).

To compute coverage for our example use:

export NYX_NET_REPLAY=/path/to/nyx-net/packer/packer/nyx_net_payload_executor.py
$PFBENCH/scripts/nyx-eval/coverage.sh -r 4 -t lightftp -f nyx -p aggressive -d /tmp -s 5

This will place output from each container in their respective folders (i.e. /tmp/out-lightftp-aggressive-00{0..3}/coverage.tar.gz).

Collecting the results to CSV

For this you can use the same script as for AFLNet. The major difference is the removal of the -e flag (output from Nyx-Net is not an archive) for the -a flag to append to an existing CSV file:

cp -r /tmp/out-lightftp-aggressive-*/ $PFBENCH/results-lightftp/
$PFBENCH/scripts/nyx-eval/convert_coverage.sh \
    -t lightftp -r 4 -f nyx -p aggressive \
    -d $PFBENCH/results-lightftp -o results.csv -a

Analyze the results

The collected results (i.e., results.csv) can be used for plotting. The PFB.jl folder contains a Julia package to analyse the results.

Alternatively, there's a sample Python script to plot code coverage over time. Use the following command to plot the results and save it to a file.

cd $PFBENCH/results-lightftp
profuzzbench_plot.py -i results.csv -p lightftp -r 4 -c 60 -s 1 -o cov_over_time.png

This is a sample code coverage report generated by the script.

Automated Pipeline

The script scripts/runqueue.py can be used to run the entire pipeline described in the tutorial for different fuzzers, configurations and targets in parallel. Under the hood it re-uses the same bash scripts described in the tutorial. It accepts a flag -j to specify how many experiments to run in parallel and a JSON configuration file defining the set of fuzzers, targets and options to run.

Description of fields:

trials: number of runs for each fuzzer and target combination [required]
timeout: maximum time allowed to run the fuzzer in minutes (does not include time to compute coverage) [required]
only_cov: do not run fuzzers but only compute replayable and coverage [default: false]
nyx_outdir: output directory for Nyx-Net [required]
afl_outdir: output directory for AFL-based fuzzers (e.g. AFLNet, AFLNwe, etc.) [required]
targets: array of subject names, all lowercase [required]
fuzzers: array of fuzzer names or configuration objects [required]
fuzzers.type: name of the fuzzer (one of nyx, aflnet, aflnwe, etc.) [required]
fuzzers.path: overrides the default path given by nyx_outdir or afl_outdir
fuzzers.no_state: applies to AFLNet only and runs the fuzzer w/o the -E flag [default: false]
fuzzers.snaps: sets the snapshot placement policy; applies to Nyx-Net only [default: none]

The following example will execute three fuzzers on two targets, each of the 10 runs will have a timeout of 1 hour; nyx_outdir and afl_outdir are the output directories for Nyx-Net and AFL-based fuzzers respectively:

{
    "trials": 10,
    "timeout": 3600,
    "nyx_outdir": "/path/to/nyx-based/output",
    "afl_outdir": "/path/to/afl-based/output",
    "targets": ["lightftp", "live555"],
    "fuzzers": {
        "nyx",
        "aflnet",
        { "type": "aflnet", "no_state": true }
    }
}

N.B.: the runqueue.py script will not run convert_coverage.sh to aggregate results into a single CSV file.

Summary of scripts for running evaluation

buildall.sh: builds docker images for all targets in parallel
execution/profuzzbench_exec_common.sh: starts containers fuzzing one target and computes coverage with AFL-based fuzzers
nyx-eval/start.sh: starts fuzzing one target with Nyx-Net
nyx-eval/reproducible.sh: converts output test cases from Nyx-Net's runs to replayable ones
nyx-eval/coverage.sh: starts containers to compute coverage for a set of test cases
nyx-eval/convert_coverage.sh: converts coverage to CSV format
runqueue.py: runs pipeline from fuzzing to coverage CSV for different combinations of fuzzers and targets

RUB-SysSec / nyx-net-profuzzbench