squeek502 / fuzzing-lua

Fuzz testing for various parts of the Lua interpreter, mostly for use as a test-case generator for alternate Lua implementations

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fuzzing Lua

Fuzz testing for various parts of the Lua interpreter, mostly for use as a test-case generator for alternate Lua implementations (e.g. Zua). Uses libFuzzer via the Clang compiler. A pre-generated corpus and corresponding outputs can be found on the releases page for quick use as a data set for testing other Lua implementations.

For a writeup of how this was used to test Zua's lexer implementation, see:

Currently supports:

  • Lua 5.1.5
    • Lexer (llex.c)

Setup

  • Make sure you have CMake, Clang, and Clang++ installed.
  • Run ./setup.sh to generate build.fuzz, build.cov, and build.tools directories via CMake.

Fuzzing

First, cd build.fuzz, then, from within that directory:

  • To run the fuzzer: make fuzz_llex_run
  • To minimize the fuzzer's corpus: make fuzz_llex_minimize

Tools

fuzz_llex_output

The lexer fuzzer has an accompanying tool to output the tokens generated when lexing each file in the corpus. This set of result files can then be used to test alternate Lua lexer implementations to ensure that they generate the exact same set of tokens.

The format of the output file is a space-separated list of tokens, printed via luaX_token2str. For example, if the input file was:

local hello = 'world'

then the output file would be:

local <name> = <string> <eof>

If there is an error while lexing, then the error appears on the second line of the file. For example, if the last ' in the 'world' string in the previous example were omitted, then the output would look like:

local <name> =
[string "fuzz"]:3: unfinished string near '<eof>'

Usage

First, run the fuzzer (see Fuzzing above) to create a comprehensive corpus. Then, to generate the output files for each file in the corpus:

cd build.tools
make fuzz_llex_output_run

The files will be in build.tools/tools/output/fuzz_llex/ and the file names will correspond with the file names of the corpus input files.

fuzz_strings_output

Given a corpus of input files containing valid unparsed string literals, this tool can generate a corresponding set of output files containing the parsed version of the string. These pairs of input/output files can then be used to test the string parsing of alternate Lua implementations.

For example, given the following input file:

"\\\thello world"

the output file would be:

\	hello world

Usage

A comprehensive set of inputs are included in this repository, so generating the outputs is as simple as:

cd build.tools
make fuzz_strings_output_run

Generating an input set

Because the PUC Lua lexer both lexes and parses strings at the same time, it's a bit hard to extract unparsed strings while lexing. Instead, Zua was used along with a corpus generated by fuzz_llex to extract all valid strings from the entire corpus and put each into its own file. Then, that set of strings was minimized by running them back through fuzz_llex with -merge=1. Specifically, the following was done:

To generate an un-minimized set of inputs with Zua:

  1. Clone zua
  2. zig build fuzz_strings_gen_run

Once you have a un-minimized set of inputs, you can minimize it using fuzz_llex by doing the following:

  1. Copy the generated files to fuzzing-lua/build.fuzz/corpus/fuzz_strings_generated
  2. cd fuzzing-lua/build.fuzz
  3. make fuzz_llex
  4. mkdir corpus/fuzz_strings
  5. ./fuzz_llex corpus/fuzz_strings corpus/fuzz_strings_generated -merge=1

Your minimized input set will then exist in fuzzing-lua/build.fuzz/corpus/fuzz_strings.

Coverage

This is currently untested. TODO: Adapt this guide.

Packaging/releasing data sets

cd build.tools
make package_data

About

Fuzz testing for various parts of the Lua interpreter, mostly for use as a test-case generator for alternate Lua implementations

License:The Unlicense


Languages

Language:C 66.1%Language:CMake 29.2%Language:Shell 3.2%Language:C++ 1.4%