ILF: AI-based Fuzzer for Ethereum Smart Contracts

ILF is an Imitation Learning based Fuzzer for smart contracts. The fuzzing policy, which is used to generate transactions, is represented by an ensemble of neural networks and is learned from thousands of high-quality sequences of transactions generated using symbolic execution. ILF can be used to fuzz any Ethereum smart contract and outputs the coverage and a vulnerability report.

ILF is developed at SRI Lab, Department of Computer Science, ETH Zurich as part of the Machine Learning for Programming and Blockchain Security projects. For mode details, please refer to ILF CCS'19 paper and slides.

Setup

We provide a docker file, which we recommend to start with. To build and run:

$ docker build -t ilf .
$ docker run -it ilf

You can also follow the instructions in the Dockerfile to install ILF locally. If you experience build errors on Apple M chips, please refer to #21.

Usage

Fuzzing

To fuzz the example provided in the repo with ILF (the imitation fuzzing policy) using our pre-trained model in the model directory:

$ python3 -m ilf --proj ./example/crowdsale/ --contract Crowdsale --fuzzer imitation --model ./model/ --limit 2000

The --fuzzer argument can be replaced by:

random: a uniformly random fuzzing policy.
symbolic: a symbolic execution fuzzing policy based on depth first search of block states. This is used for generating training sequences.
sym_plus: an augmentation of symbolic which can revisit encountered block states.
mix: a fuzzing policy that randomly chooses imitation or symbolic for generating each transaction.

For fuzzing new contracts, one needs to provide a Truffle project (formatted as the example in example/crowdsale). Then the script script/extract.py should be called to extract deployment transactions of the contracts. For the example contract, the script runs as follows:

$ rm example/crowdsale/transactions.json
$ python3 script/extract.py --proj example/crowdsale/ --port 8545

Note that you need to kill existing ganache-cli processes listening the same port before calling this script.

Training

For training, one needs to run symbolic on a set of training contracts to produce a dataset in a training directory. Usually tens of thousands of contracts are used for training. For demonstration purposes, we show how to produce a small training dataset from our example contract to the train_data directory:

$ mkdir train_data
$ python3 -m ilf --proj ./example/crowdsale/ --contract Crowdsale --limit 2000 --fuzzer symbolic --dataset_dump_path ./train_data/crowdsale.data

Run the scripts to select seed integer values and amount values from the training dataset, and put them into ilf/fuzzers/imitation/int_values.py and ilf/fuzzers/imitation/amounts.py, respectively:

$ python3 script/get_int_values.py --train_dir ./train_data
$ python3 script/get_amounts.py --train_dir ./train_data

Then the following command performs neural network training and outputs the trained networks in the new_model directory:

$ mkdir new_model
$ python3 -m ilf --fuzzer imitation --train_dir ./train_data --model ./new_model

Automatically Constructing Truffle Projects

For evaluation and training purposes, one might want to automatically construct Truffle projects from a large set of contracts. To achieve this, one can write a script to automatically produce files required by Truffle projects, following the format in example/crowdsale. The compressed file truffle_scripts.tar.gz contains the scripts we used. Those scripts might not run directly but can give you a high level idea how things work.

Citing ILF

@inproceedings{He:2019:LFS:3319535.3363230,
 author = {He, Jingxuan and Balunovi\'{c}, Mislav and Ambroladze, Nodar and Tsankov, Petar and Vechev, Martin},
 title = {Learning to Fuzz from Symbolic Execution with Application to Smart Contracts},
 booktitle = {Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security},
 series = {CCS '19},
 year = {2019},
 isbn = {978-1-4503-6747-9},
 location = {London, United Kingdom},
 pages = {531--548},
 numpages = {18},
 url = {http://doi.acm.org/10.1145/3319535.3363230},
 doi = {10.1145/3319535.3363230},
 acmid = {3363230},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {fuzzing, imitation learning, smart contracts, symbolic execution},
}

Contributors

Jingxuan He
Mislav Balunović
Nodar Ambroladze
Petar Tsankov
Martin Vechev
Anton Permenev

License and Copyright

Copyright (c) 2019 Secure, Reliable, and Intelligent Systems Lab (SRI), ETH Zurich
Licensed under the Apache 2.0 License

eth-sri / ilf