JordanSamhi / Scalpel

Scalpel: A Python Program Analysis Framework

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Scalpel: The Python Static Anaysis Framework

Scalpel is a Python Static Analysis Framework. It provides essential program analysis functions for facilitating the implementation of client applications focusing on statically resolving dedicated problems.

Setting up Scalpel

You can download the source code of Scalpel to install mannuly or use pip to install automatically.

python setup.py

Brief Introduction

Detailed user guides can be found Here.

We aim to provide Scalpel as a generic Python static analysis framework that includes as many functions as possible (e.g., to easily build inter-function control-flow graph, to interpret the import relationship of different Python modules, etc.) towards facilitating developers to implement their dedicated problem-focused static analyzers. The following figure depicts the current architecture of its design.

Scalpel Design

  • Function 1: Code Rewriter. The code rewriter module is designed as a fundamental function for supporting systematic changes of existing Python programs. Two preliminary usages of this function are to (1) simplify the programs for better static analysis and (2) optimize or repair problematic programs. For supporting the first usage, we integrate into the framework a database including a set of rules indicating how matched code snippets should be transformed. This database should be continuously extended to fulfill the complicated simplification requirements for achieving effective static analysis of Python programs. For supporting the second usage, inspired by the optimization mechanism provided by Soot (one of the most famous static Java program analysis frameworks), we also set up a transformation process with dedicated callback methods to be rewritten by users to optimize Python code based on their customized needs.

  • Function 2: Control-Flow Graph Construction. The control-flow graph(CFG) construction module generates intra-procedural CFGs, which are an essential component in static flow analysis with applications such as program optimization and taint analysis. A CFG represents all paths that might be traversed through a program during its execution. The CFGs of a Python project can be combined with the call graph to generate an inter-procedural CFG of the project.

  • Function 3: Static Single Assignment (SSA) Representation. The static single assignment module provides compiler-level intermediate representations (IR) for code analysis. It can not only be used for symbolic execution, but also for constant propagation. By renaming each variable assignment with different names, we are able to obtain explicit use-def chains, therefore precisely tracking how data flow in the program.

  • Function 4: Alias Analysis. Since variables can point to the same memory location or identical values, the alias analysis function is designed to model such usages. This function can be vital to sound constant propagation. In addition, alias analysis will also benefit type checking as well as API name qualifying.

  • Function 5: Constant Propagation. The constant propagation module will evaluate the actual values for variables at certain program points in different execution paths before runtime. With the actual values known beforehand, we are able to optimize code and detect bugs. The constant propagation will utilize the representation from the SSA module to keep recording values from each assignment for a single variable.

  • Function 6: Import Graph Construction. In python, import flows and relations have been pointed out to be important for API mapping, dependency analysis. Our import graph construction aims to provide a data structure to represent these import relationships across the Python module files in the same project. The import graphs of multiple Python projects can be combined to perform inter-library dataflow analysis.

  • Function 7: Fully Qualified Name Inferrer. Python APIs or function names can be invoked in different ways depending on how they are imported. However, this results in inconveniences to API usage analysis. In this module, we will convert all function call names to their fully-qualified names which are dotted strings that can represent the path from the top-level module down to the object itself. Various tasks can be benefited from this functionality such as understanding deprecated API usage, dependency parsing as well as building sound call graphs.

  • Function 8: Call Graph Construction. A call graph depicts calling relationships between methods in a software program. It is a fundamental component in static flow analysis and can be leveraged in tasks such as profiling, vulnerability propagation, and refactoring. This module addresses the challenges brought by complicated features adopted in Python, such as higher-order functions and nested function definitions, to construct the precise call graphs for given Python projects.

  • Function 9: Type Inference. Python, as a dynamically typed language, faces the problem of being hard to utilize the full power of traditional static analysis. This module infers the type information of all variables including function return values and function parameters in a Python program, making more sophisticated static analysis possible for Python. We utilize backward data-flow analysis and a set of heuristic rules to achieve high precision.

API Documentation

The Scalpel's API documentation is available at Here

Acknowledgement

This project has been inspired and supported by many existing works. If you think your work appears in this project but has not been mentioned yet, please let us know by any means.

  1. Fuzzyingbook.
  2. Debugging book.
  3. StaticCFG.
  4. PyCG: Practical Call Graph Generation in Python. In 43rd International Conference on Software Engineering, 2021.
  5. A Simple, Fast Dominance Algorithm Keith D. Cooper, Timothy J. Harvey, and Ken Kennedy
  6. COS598C Advanced Compilers, Princeton University
  7. Restoring Execution Environments of Jupyter Notebooks
  8. Static Single Assignment Book

About

Scalpel: A Python Program Analysis Framework

License:Apache License 2.0


Languages

Language:Python 96.9%Language:Jupyter Notebook 1.5%Language:HTML 0.7%Language:JavaScript 0.5%Language:CSS 0.3%Language:TeX 0.1%