Thomsch / Flexeme

This project provides several implementations for commit untangling and proposes a new representation of git patches by projecting the patch onto a PDG.

Home Page:https://pppi.github.io/Flexeme/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FLEXEME

Java implementation of Flexeme based on the original implementation of Flexeme.

Please see ORIGINAL_INSTRUCTIONS.md document for the documentation of the original Flexeme repository.

Requirements

  • Requires Python 3.8.
  • Requires Java 8 on the path
  • Requires Java 11 in JAVA11_HOME environment variable

Installation

  1. Install Graphviz https://graphviz.org/.
rm -rf .venv && python3 -m venv .venv
source .venv/bin/activate
pip install -e .

If the dependency pygraphviz fails to install, visit https://pygraphviz.github.io/documentation/stable/install.html and follow the instructions for your OS.

  1. Run cp .env-template .env then fill in the environment variables in .env:
    • JAVA11_HOME: Location of the Java 11 executable to run the PDG extractor. (e.g., $HOME/.sdkman/candidates/java/11.0.18-amzn/bin/java)

Synthetic Benchmark

Run Flexeme on the synthetic benchmark.

input: path to repository

output: untangling accuracy for repository

Steps:

  1. Create lists of commit ids (e.g., [a, b, c, d]). A list of commit ids represents multiple synthetic commits of varrying size (named 'concerns'). e.g.,
    • a to b represent a synthetic commit with 1 concern
    • a to c represent a synthetic commit with 2 concerns
    • a to d represent a synthetic commit with 3 concerns
  2. Generate ∂PDGs for each synthetic commits:
    • Each file changed in the synthetic commit gets a ∂PDG
  3. Merge file-based ∂PDG into a single ∂PDG to represent the synthetic commit.
  4. Normalization of labels in ∂PDGs.
  5. Evaluation (runs the untangling on the ∂PDGs).
  6. Report untangling accuracy.

Running the benchmark

  1. Checkout Defects4J repository git clone $D4J_HOME/project_repos/commons-lang.git /private/tmp/commons-lang.
  2. Creating synthetic commits python3 flexeme/tangle_concerns/tangle_by_file.py /private/tmp/commons-lang /private/tmp/ ..
  3. Generate ∂PDGs and evaluate: python3 flexeme/tangle_concerns/generate_corpus.py ./commons-lan_history_filtered_flat.json /private/tmp/commons-lang /private/tmp/commons-lang-work/.
  4. Results are saved in out/commons-lang/.

Layout changes

The file defects4j/layout_changes.json contains the changes in repository layouts for sourcepath for Defects4J projects. The file is necessary for running the synthetic benchmark. The changes are ordered from newest to oldest.

When untangling a commit, the scripts find the correct layout by checking if the newest layout change commit is an ancestor. If it is not, it will check the next older layout change commit until it finds an ancestor. If no ancestor is found, a warning is logged and the layout returned is None.

The layout changes are added manually from the dir_layout.csv project-specific file stored in the Defects4J repository. The entries in dir_layout.csv are ordered either from new to old or from old to new. Before adding a new project in defects4j/layout_changes.json, verify which order is used in dir_layout.csv.

Untangle Commits

Run Flexeme to untangle a commit in a local repository.

  1. Run: flexeme <repository> <commit> <sourcepath> <classpath> <output_file>
    • repository: Path to the repository.
    • commit: Commit to untangle.
    • sourcepath: Java sourcepath to compile the files of commit.
    • classpath: Java classpath to compile the files of commit.
    • output_file: Where the results are stored.

About

This project provides several implementations for commit untangling and proposes a new representation of git patches by projecting the patch onto a PDG.

https://pppi.github.io/Flexeme/

License:MIT License


Languages

Language:Jupyter Notebook 92.0%Language:Python 8.0%Language:Shell 0.0%