liyansong2018 / moonlight

MoonLight is a research project looking at efficient methods for designing a corpus typically for use in fuzzing campaigns. 一个用于精简afl语料库的工具,相比于afl-cmin,可以大幅提高裁剪速度。

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MoonLight: Fuzzing Corpus Design and Construction

moonlight logo

MoonLight is a research project looking at efficient methods for designing a corpus typically for use in fuzzing campaigns. We call this design process distillation.

Corpus distillation is the process by which we take a very large corpus of file types - e.g. 100,000 PDF files - called seeds - and choose a much smaller subset to fuzz against - say 1,000 PDF files. However, we want this smaller subset corpus to have the same "expressive power" as the original large corpus. We don't want to lose any information in the distillation process.

Currently "expressive power" means maximising code coverage. However, we are planning other types of measures of interest, such as execution complexity and usage of specific libraries or system calls.

The project name "MoonLight" is a pun on the word distillation. We use the term distillation in preference to the commonly used term reduction. This is because in fuzzing parlance, "reduction" is a term that is also used in the crash triage process.

We have included five benchmark corpora in this repo. The following table gives you an understanding of what level of distillations you should get when you run the tool.

File Type Collection Size Distilled Size Improvement Gain
PDF 42,056 664 63
DOC 7,836 745 10
PNG 4,831 94 51
TTF 5,666 27 210
HTML 69,991 410 171

Now if you were fuzzing Adobe Acrobat, instead of using 42,000 files you would only need the much smaller subset of 664 files.

Sometimes what is important is not just the number of seeds in your corpus but the total weight (or size) of the seeds in bytes. This is particularly important if your fuzzer is IO bound. In this case the corpus design and construction is choosing seeds which not only maximise code coverage but also have the smallest weight.

MoonLight can perform both weighted and unweighted distillations.

For more information on the performance of MoonLight against the current state of the art see Results and our paper on Arxiv. If you use MoonLight, please cite our work:

@article{2020:Herrera:MoonLight,
  author    = {Adrian Herrera and
               Hendra Gunadi and
               Liam Hayes and
               Shane Magrath and
               Felix Friedlander and
               Maggi Sebastian and
               Michael Norrish and
               Antony L. Hosking},
  title     = {Corpus Distillation for Effective Fuzzing: A Comparative Evaluation},
  journal   = {CoRR},
  volume    = {abs/1905.13055},
  year      = {2020},
  url       = {http://arxiv.org/abs/1905.13055},
  archivePrefix = {arXiv},
  eprint    = {1905.13055},
}

Installation

See here

Usage

See here

Simple Usage [简单用法]

简体中文

Trace Data

For more information on the expected input format of seed trace data see here.

About

MoonLight is a research project looking at efficient methods for designing a corpus typically for use in fuzzing campaigns. 一个用于精简afl语料库的工具,相比于afl-cmin,可以大幅提高裁剪速度。

License:Apache License 2.0


Languages

Language:C++ 85.2%Language:Python 10.8%Language:CMake 4.0%