TylerOderkirk/afl

==================
american fuzzy lop
==================

Written and maintained by Michal Zalewski <lcamtuf@google.com>

Copyright 2013, 2014 Google Inc. All rights reserved.
Released under terms and conditions of Apache License, Version 2.0.

For new versions and additional information, check out:
http://code.google.com/p/american-fuzzy-lop/

1) Background
-------------

Fuzzing is one of the most powerful strategies for identifying security issues
in real-world software. Unfortunately, it also offers fairly shallow coverage:
it is impractical to exhaustively cycle through all possible inputs, so even
something as simple as setting three separate bytes to a specific value to
reach a chunk of unsafe code can be an insurmountable obstacle to a typical
fuzzer.

There have been numerous attempts to solve this problem by augmenting the
process with additional information about the behavior of the tested code.
These techniques can be divided into three broad groups:

  - Simple coverage maximization. This approach boils down to trying to isolate
    initial test cases that offer diverse code coverage in the targeted
    application - and them fuzzing them using conventional techniques.

  - Control flow analysis. A more sophisticated technique that leverages
    instrumented binaries to focus the fuzzing efforts on mutations that
    generate distinctive sequences of conditional branches within the
    instrumented binary.

  - Static analysis. An approach that attempts to reason about potentially
    interesting states within the tested program and then make educated guesses
    about the input values that could possibly trigger them.

The first technique is surprisingly powerful when used to pre-select initial test
cases from a massive corpus of valid data - say, the result of a large-scale web
crawl. Unfortunately, coverage measurements provide only a very simplistic view
of the internal state of the program, making them less suited for creatively
guiding the fuzzing process later on.

The latter two techniques are extremely promising in experimental settings. That
said, in real-world applications, they frequently lead to irreducible complexity:
most of the high-value targets will have a vast number of internal states and
possible execution paths, and deciding which ones are interesting and
substantially different from the rest is an extremely difficult challenge that,
if not solved, usually causes the "smart" fuzzer to perform no better than a
traditional one.

2) About AFL
------------

American Fuzzy Lop uses edge coverage to detect subtle, local-scale changes to
program control flow without having to perform a complex comparisons between
series of distinctive traces (a common failure point for other tools).

In almost-plain English, the fuzzer does this by instrumenting the code to record
tuples in the following format:

  <ID of current code location>, <ID of previously-executed code location>

The ordering for each tuple is discarded and hit count is tracked only coarsely;
the main signal used by the fuzzer is the appearance of a previously-unseen
tuple in the output dataset. This method combines the self-limiting nature of
simple coverage measurements with the sensitivity of control flow analysis.

The instrumentation is used as a part of a simple queue-based algorithm:

  1) Load user-supplied initial test cases into the queue,

  2) Take input file from the queue,

  3) Repeatedly mutate the file using a balanced variety of traditional fuzzing
     strategies,

  4) If any of the generated mutations resulted in a new tuple being recorded
     by the instrumentation, add mutated output as a new entry in the queue.

  5) If queue not empty, go to 2.

The discovered test cases are also periodically culled to eliminate ones that
have been made obsolete by more inclusive finds discovered later in the
fuzzing process. Because of this, the fuzzer is useful not only for
identifying crashes, but is exceptionally effective at turning a single valid
input file into a reasonably-sized corpus of interesting test cases that can
be manually investigated for non-crashing problems, or used to stress-test
applications that are harder to instrument or too slow to fuzz efficiently.
In particular, it can be extremely useful for generating small test sets that
may be programatically or manually examined for anomalies in a browser environment.

In real-world testing with libraries such as libjpeg, libpng, libtiff, or
giflib, the fuzzer requires no fine-tuning, and significantly outperforms
blind fuzzing or coverage-only tools. Using a common set of strategies against
libjpeg, the tool toggles twice as many branches as non-instrumented fuzzing
(IGNORE_FINDS in config.h) and identifies several times as many distinctive
test cases than coverage-based algorithms (COVERAGE_ONLY in config.h).

3) Instrumenting programs for use with AFL
------------------------------------------

Instrumentation is injected by a companion tool called afl-gcc. It is meant to
be used as a drop-in replacement for GCC, directly pluggable into the standard
build process for any third-party code.

The instrumentation has a relatively modest performance impact; in conjunction
with other optimizations implemented by the tool, many instrumented programs
can be fuzzed as fast or even faster than possible with traditional tools.

The injected probes are designed to work with C and C++ code compiled on 32-bit
x86 platforms. You can edit config.h and uncomment USE_64BIT to switch to
64-bit instrumentation. Porting to other platforms should not be difficult;
in fact, there is an early-stake ARM port in experimental/arm_support/.

The correct way to recompile the target program will vary depending on the
specifics of the build process, but a common approach may be:

$ export AFL_PATH=/path/to/afl/
$ CC=$AFL_PATH/afl-gcc ./configure
$ make clean all

For C++ programs, it may be necessary to specify this instead:

$ CXX=$AFL_PATH/afl-g++ ./configure

When testing libraries, it is essential to either link the tested executable
against a static version of the instrumented library (./configure
--disable-shared may be useful), or to use a proper binary (beware of shell
script wrappers generated by the build system) with the right LD_LIBRARY_PATH.

Setting AFL_HARDEN in the environment will cause afl-gcc to automatically enable
several GCC hardening features that may make it easier to detect memory bugs;
with GCC 4.8 and above, this includes ASAN / MSAN.

4) Choosing initial test cases
------------------------------

To operate correctly, the fuzzer requires one or more input file containing
a normal, typical input normally processed by the targeted application.

Whenever possible, the file should be reasonably small; under 1 kB is ideal,
although not strictly necessary.

There is limited value in providing multiple files that are not fundamentally
different from each other, and exercise the same set of features. When in
doubt, use fewer samples, not more. One test case is perfectly fine in most
scenarios.

If a large corpus of data is available for screening, the afl-showmap utility
can be employed to compare the instrumentation data recorded for various
inputs. Files that not produce any previously-unseen tuples can be usually
rejected. The fuzzer also performs basic internal de-duplication on its own.

5) Fuzzing instrumented binaries
--------------------------------

The fuzzing process itself is carried out by the afl-fuzz utility. The program
requires an input directory containing one or more initial test cases, plus a
path to the binary to test.

For tested programs that accept data on stdin, the usual syntax may be:

$ ./afl-fuzz -i input_dir -o output_dir /path/to/program [...params...]

For programs that need to read data from a specific file, the appropriate
path can be specified via the -f flag, e.g.:

$ ./afl-fuzz [...] -f testme.txt /path/to/program testme.txt

It is possible to fuzz non-instrumented code using the -n flag. This gives you
a fairly traditional fuzzer with a couple of nice testing strategies.

You can use -t and -m to override the default timeout and memory limit for the
executed process, although this is seldom necessary.

The fuzzing process itself is relatively simple. It consists of several types
of sequential, deterministic operations (bitflips, injection of interesting
integers) followed by a "havoc" stage with multiple stacked, random
modifications - block deletion, cloning, random bitflips, etc.

The deterministic stage takes time proportional to the size of the input file;
once this is done, the havoc stages continue for every discovered input until
Ctrl-C is hit.

For large inputs, you can use -d to skip the deterministic stages and proceed
straight to random tweaks. Some other tips for optimizing the performance of
the process are discussed in the perf_tips.txt file included with the source
code of AFL.

6) Interpreting output
----------------------

The fuzzer keeps going until aborted with Ctrl-C or killed with SIGINT or
SIGTERM. The progress screen provides various useful stats, including the
number of distinctive execution paths discovered, the current queue cycle,
the number of crashes recorded, and the number of execve() calls per second.

For more info about the displayed data, visit this URL:

  https://code.google.com/p/american-fuzzy-lop/wiki/StatusScreen

There are three subdirectories created within the output directory:

  - queue/ - test cases for every distinctive execution path, along with hard
             links to any initial input files. The data is essentially a
             corpus of interesting test cases valuable for seeding other,
             more resource-intensive testing steps.

             The directory can be also used to resume aborted jobs; simply do:

             ./afl-fuzz -i old_output_dir/queue -o new_output_dir [...]

  - hangs/ - test cases that cause the tested program to time out. The entries
             are grouped by a 32-bit hash of the execution path.

  - crashes/ - test cases that caused the tested program to receive a fatal
               signal (e.g., SIGSEGV, SIGILL, SIGABRT). The entries are 
               grouped by the received signal, followed by the hash of the
               execution path.

In all three directories, the first segment of every file name is a sequential
ID of the generated test case, and the second one corresponds to the "parent"
queue entry that the entry is derived from; there's some other fairly
self-explanatory metadata appended, too.

Although the fuzzer does not perform any additional analysis of the discovered
crashes, the path-based grouping makes it easy to triage new finds manually - or
to examine them with a simple GDB script. One such script is provided in
experimental/crash_triage/.

7) Parallelized fuzzing
-----------------------

For tips on how to fuzz a common target on multiple cores or multiple networked
machines, please refer to the parallel_fuzzing.txt file included with the source
code of American Fuzzy Lop.

8) Known limitations & areas for improvement
--------------------------------------------

Here are some of the most important caveats for AFL:

  - The fuzzer is optimized for compact binary data formats, such as images
    and other multimedia. It is not well-suited for verbose, human-readable
    formats such as XHTML or JavaScript. In such cases, template- or ABNF-based
    generators tend to fare better.

    (The fuzzer can still give a good workout to the first line of XML or JS
    parsing, just won't be able to guess higher-order syntax particularly
    well.)

    If you want to modify the code to generate syntax-aware mutations, you'd
    need to start with fuzz_one() in afl-fuzz.c.

  - The fuzzer offers limited coverage if encryption, checksums, cryptographic
    signatures, or compression are used to wholly wrap the actual data format
    to be tested.

    Good solutions include manually commenting out the relevant checks in the
    instrumented application, or using a wrapper that postprocesses the
    fuzzer-generated data before feeding it to the target program.

    As an example, a patch for libpng to bypass CRC checksums is provided in 
    experimental/libpng_no_checksum/libpng-nocrc.patch.

  - The included instrumentation (afl-as.h) currently supports 32-bit x86 code
    by default, with the option to go 64-bit by uncommenting USE_64BIT in
    config.h. If you are feeling adventurous, an experimental ARM port can be
    found in experimental/arm_support/, too.

    For other CPUs, rewriting the assembly code in afl-as.h should be a very
    simple task. Note that minor tweaks to fuzz_one() in afl-fuzz.c may be
    necessary to accommodate architectures that require aligned memory
    access.

  - The approach is theoretically compatible with any GCC-supported language,
    but only C and C++ were confirmed to work. Objective C is very likely
    to work; reports for other languages (e.g. GCJ Java) are welcome.

  - Instrumentation of binary-only code is theoretically possible, but not
    supported at this point. Leveraging pin or DynamoRIO may be a good
    approach.

9) Contact
----------

Questions? Concerns? Bug reports? The author can be usually reached at
<lcamtuf@google.com>.
TylerOderkirk / afl

About

Languages