A utility to identify and map the semantic structure of files, including polyglots, chimeras, and schizophrenic files. It can be used in conjunction with its sister tool PolyTracker for Automated Lexical Annotation and Navigation of Parsers, a backronym devised solely for the purpose of collectively referring to the tools as The ALAN Parsers Project.
In the same directory as this README, run:
pip3 install -e .
This will automatically install the polyfile
executable in your path.
usage: polyfile [-h] [--filetype FILETYPE] [--list] [--html HTML]
[--try-all-offsets] [--debug] [--quiet] [--version]
[-dumpversion]
[FILE]
A utility to recursively map the structure of a file.
positional arguments:
FILE The file to analyze; pass '-' or omit to read from
STDIN
optional arguments:
-h, --help show this help message and exit
--filetype FILETYPE, -f FILETYPE
Explicitly match against the given filetype (default
is to match against all filetypes)
--list, -l list the supported filetypes (for the `--filetype`
argument) and exit
--html HTML, -t HTML Path to write an interactive HTML file for exploring
the PDF
--try-all-offsets, -a
Search for a file match at every possible offset; this
can be very slow for larger files
--debug, -d Print debug information
--quiet, -q Suppress all log output (overrides --debug)
--version, -v Print PolyFile's version information to STDERR
-dumpversion Print PolyFile's raw version information to STDOUT and
exit
To generate a JSON mapping of a file, run:
polyfile INPUT_FILE > output.json
You can optionally have PolyFile output an interactive HTML page containing a labeled, interactive hexdump of the file:
polyfile INPUT_FILE --html output.html > output.json
PolyFile can identify all 10,000+ file formats in the TrID database. It currently has support for parsing and semantically mapping the following formats:
- PDF, using an instrumented version of Didier Stevens' public domain, permissive, forensic parser
- ZIP, including reursive identification of all ZIP contents
- JPEG/JFIF, using its Kaitai Struct grammar
- iNES
- Any other format specified in a KSY grammar
For an example that exercises all of these file formats, run:
curl -v --silent https://www.sultanik.com/files/ESultanikResume.pdf | polyfile --html ESultanikResume.html - > ESultanikResume.json
PolyFile outputs its mapping in an extension of the SBuD JSON format described in the documentation.
- The instrumented Kaitai Struct parser generator implementation has only been tested on the JPEG/JFIF grammar; other KSY definitions may exercise portions of the KSY specification that have not yet been implemented
This research was developed by Trail of Bits with funding from the Defense Advanced Research Projects Agency (DARPA) under the SafeDocs program as a subcontractor to Galois. It is licensed under the Apache 2.0 lisense. The PDF parser is modified from the parser developed by Didier Stevens and released into the public domain. © 2019, Trail of Bits.