thelumberjhack / corpusgen

Corpus is an asynchronous web crawler for you to grab a set of sample files. Then use afl-cmin to create a minset of them for later use with AFL. Code is provided as is and likely won't be maintained by me. Feel free to use it (at your own risk).

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Corpus

Description

Corpus is an asynchronous web crawler for you to grab a set of sample files. Then use afl-cmin to create a minset of them for later use with AFL

Setup

Corpus has been implemented using asyncio module from python 3.5 therefore you need to use python >= 3.5.0.

Pre-requisites

virtualenvwrapper>=4.7
$ pip install mkvirtualenv

Virtualenv configuration is left to the discretion of the user. Once you're setup go to the next steps.

Installation

Clone source and then create virtualenv to use Corpus app as follows:

$ cd corpus
$ mkvirtualenv -p python3 -r requirements.txt corpus

Now you are ready to use it.

Usage

$ workon corpus
(corpus) $ ./corpus.py
usage: corpus.py --roots [ROOT_DOMAINS [ROOT_DOMAINS ...]] --file_type
                 FILE_TYPE -o OUT_DIR [-i] [--select] [-r MAX_REDIRECT]
                 [-t MAX_TRIES] [-c MAX_TASKS] [-e REGEX] [-s] [-v] [-q]
                 [-m MAX_SIZE]
corpus.py: error: the following arguments are required: --roots, --file_type, -o/--output
(corpus) $
(corpus) $ ./corpus.py www.adobe.com --file-type pdf -o test

About

Corpus is an asynchronous web crawler for you to grab a set of sample files. Then use afl-cmin to create a minset of them for later use with AFL. Code is provided as is and likely won't be maintained by me. Feel free to use it (at your own risk).

License:MIT License


Languages

Language:Python 100.0%