Brandon82 / astnn

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ASTNN - An Equivalent Mutant Identifier

An Abstract Syntax Tree Neural Network aiming to identify equivalent mutants within a dataset of mutants.

  • Please note that the /parser folder isn't necessary and was used for additional preprocessing on our dataset before integrating it into the pipeline.
  • Our initial dataset consisted solely of unified difference strings, so we created a unified-diff parser to insert the mutations into their original programs.

Requirements

  python 3.10
  pandas 2.0.0
  gensim 4.3.1
  scikit-learn 1.2.2
  torch 2.0.0
  pycparser 2.21
  javalang 0.13

Note that the original ASTNN implementation, from which this was forked, utilized Python 3.6 and older library versions. We have updated the project to Python 3.10 and more recent library versions.

Added Features

Here's a list of changes compared to the original repository:

  • Implemented a mutant dataset from Mutantbench
    • Created a parser that inserts each mutation into its original program and returns the mutated method/file.
  • Updated to Python 3.10
  • Updated to more recent library versions
  • Added open_data.py to convert .pkl files to .csv for easier viewing
  • Updated the model to support equivalent mutant identification
  • Saved the trained model to allow for inference
  • Added test_from_trained.py to get predictions on additional data from the trained model
  • General refactoring of pipeline.py and train.py to improve code quality
  • Provided a new approach to equivalent mutant identification: using Code2Vec's code vectors with a feedforward neural network found in model_c2v.py

About

License:MIT License


Languages

Language:C 77.1%Language:Java 19.2%Language:Python 3.7%