Abstract syntax trees (ASTs) are hierarchical, recursive structures for representing source code. Parsing them typically requires a full traversal of the code, which costs O(n) operations.
But, can we think differently?
Instead of parsing code structures, why cannot we load them into memory as an efficient binary structure before any further analysis? This would only cost O(1) operations.
This project adopts flatbuffers, a one-dimensional array to represent the ASTs as a binary file, and demonstates the improved efficiency and applicability to software development.
Once installed, our tool manipulates source code 10x faster.
Created by gh-md-toc
Your software development projects may benefit directly from fast
if you find
the following activities slow:
Parsing => 10-100x faster
- Java, C, C++, C#, Objective C -- supported by srcML
- Smali/Android -- supported by smali in apktool
- or one of 175 programming languages supported by antlr4
Program slicing => 2.5x faster and 1.5x smaller
Diff-Patching => 35x faster
Synchronisation => 1.5x smaller for slicing
TODOs
- Create a Java wrapper for the tool (requested by @Chris2011)
- Merge Python grammar with SrcML grammar to allow slicing python
- Generate Pickle AST from FAST representation using flatbuffers
0.0.8 TBD
0.0.7 (November 3, 2017)
- Integrated with bi-tbcnn
- Supported Solidity grammar
0.0.6 (October 5, 2017)
- Created an Python3 parser in C++ based on the official ANTLR4 grammar in Java and extended the FAST schema accordingly, merging the branch `python3'; Currenly error handling feature is turned off.
- Implemented docker image based on the alpine:edge image, which is much smaller than the ubuntu image
- Generated Pickle AST from FAST representation (requested by @bdqnghi )
0.0.5 (August 25, 2017)
- Integrated with biyacc
- Created a Dockerfile to simplify the deployment
- Implemented normalisation concept from meaningful changes tool, ASE'11
by migrating the txl-based implementation, see
-n
option - Rewritten the interface to speedup gumtreediff, ASE'14 and treedifferencing, ASE'16
- Added colors to the output of diff results so that it is possible to integrate with git on the command line interface
- Added -u option for the YUML extraction (see srcYUML)
- Generated the patch from the diff records of GumTreeDiff integration with BiYacc
- Reduced the size of FAST for slicing
0.0.4 (August 1, 2017)
- Updated schema's Kinds as a union type, accommodating more ANTLR4 languages when needed (currently, Kind => srcml; SmaliKind => smali)
- Removed the ANTLR3 branch to take full advantage of latest ANTLR4
- Fixed some lexer errors in
smaliLexer.g4
(now all code ofInstagram
apk can be processed 10x faster) - Added
apk2pb
script to process an APK into a tarball of protobuf representations - Modified the
Pairs
schema to include hashes - Formed the `f-ast' team to maintain the project
- Complete slice-diff feature
- Added JSON output for decoding FAST and pipe to jq for further querying
- Added -w option to report the maximum width of the AST (i.e. number of children of the tree nodes), -W limit option to limit the width to the limit
- Added -i option to report the identifiers appeared as function/variable names or comment tokens and tokenize them using intt
- Added -b option to convert bug reports into protobuf format
0.0.3 (July 6, 2017)
- Generalised the code schema to support automated software engineering activities, e.g. slicing, diffing, cloning
- Placed "tail" information after "child" in schema to remove shift-reduce errors in the application of BiYacc
- Added support to ANTLR4 in C++ (which unfortunately caused a conflict in the older dependencies of antlr@2 (required by srcml). A workaround (see an update to the installation guide.)
- Converted srcSlicing CSV output into the supported protobuf schema
0.0.2 (June 21, 2017)
- Added support for smali code through its ANTLR3 grammar in Java
- Added srcSlice support to improve the speed of forward slicing by 2x
- Added ANTLR3 libraries to improve GumTreeDiff speed
0.0.1 (April 11, 2017)
- Initial public release: support round-trip translation between srcML and protobuf/flatbuffers binary ASTs, improving the parsing speed by 10x
© 2017 F-AST team. FAST is released under BSD license, see license.txt for details.