se2p / FlaPy

A Tool for Mining Flaky Tests at Scale

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FlaPy

FlaPy is a small tool that allows software developers and researchers to identify flaky tests within a given set of projects by rerunning their test suites.

It is the result of research carried out at the Chair of Software Engineering II at the University of Passau, Germany.

Video tutorial

Conference presentation @ ICSE'23 DEMO

Conference presentation (ICSE'23 DEMO)

Conference presentation @ ICSE'23 DEMO

Using FlaPy

Installation

System requirements: docker (executable without root privileges)

Clone the repository to get the helper scripts:

git clone https://github.com/se2p/flapy

cd flapy/

FlaPy’s main entry point is the script flapy.sh, which offers two commands: run and parse. The FlaPy docker image will be pulled automatically on first usage.

Preparing the input-csv

Prepare a CSV file with the following columns (example: flapy_input_example.csv):

PROJECT_NAME,PROJECT_URL,PROJECT_HASH,PYPI_TAG,FUNCS_TO_TRACE,TESTS_TO_BE_RUN

Every line in the input file will result in one execution of the container. We call this an iteration. You can have duplicate lines in this input file to analyze the same project multiple times. In fact, we actively use this to detect infrastructure flakiness, which might occur only between iterations, not within. PROJECT_NAME, PROJECT_URL and PROJECT_HASH will be used to uniquely identify a project when accumulating results across multiple iterations. PROJECT_URL can also be local directory, which will then be copied into the container. PYPI_TAG is used to install the project itself via pip before executing its testsuite to fetch it's dependencies. If PYPI_TAG is empty, FlaPy will fall back to searching for requirements in common files like requirements.txt

Run tests locally

Example (takes ~ 1h):

#              [OPTIONS...]                                 INPUT_CSV
./flapy.sh run --out-dir example_results --plus-random-runs flapy_input_example.csv 5

Example (takes ~30s):

./flapy.sh run --out-dir example_results flapy_input_example_tiny.csv 1

Run tests on SLURM cluster

./flapy.sh run --out-dir example_results \
  --plus-random-runs \
  --run-on cluster --constraint CONSTRAINT \
  flapy_input_example.csv

where CONSTRAINT is forwarded to sbatch --constraint

Analyze results

./flapy.sh parse ResultsDirCollection \
  --path example_results \
  get_tests_overview _df \
  to_csv --index=False example_results_to.csv

Note: the directory specified after --path needs to be accessible from the current working directory since only the current working directory is mounted to the container that is started in the background!!

Tracing

FlaPy offers an option to trace the execution of a function, i.e., to log all function and method calls made in the course of its execution. The functions that shall be traced must be specified as a space separated list in the fifth column of the input-csv. For example test_flaky.py::test_network_remote_connection_failure test_flaky.py::test_concurrency in flapy_input_example_tiny_trace.csv.

Example (takes ~30s):

./flapy.sh run --out-dir example_results flapy_input_example_tiny_trace.csv

Within the resulting results.tar.xz archive, we can now find two extra files:

workdir/sameOrder/tmp/flapy_example_trace0test_flaky.py._('test_flaky.py', 'test_concurrency').txt
workdir/sameOrder/tmp/flapy_example_trace0test_flaky.py._('test_flaky.py', 'test_network_remote_connection_failure').txt

containing the traces:

--> ('test_flaky', '', 'test_network_remote_connection_failure')
----> ('requests.api', '', 'get')
------> ('requests.api', '', 'request')
--------> ('requests.sessions', 'Session', '__init__')
----------> ('requests.utils', '', 'default_headers')
------------> ('requests.utils', '', 'default_user_agent')
<------------ ('requests.utils', '', 'default_user_agent')
------------> ('requests.structures', 'CaseInsensitiveDict', '__init__')
...

SFFL

(Spectrum-based Flaky Fault Localization)

From our paper Debugging Flaky Tests using Spectrum-based Fault Localization

1. Run tests multiple times while collecting line coverage

Execute flapy.sh run with core arguments --collect-sqlite-coverage-database

./flapy.sh run \
    --out-dir example_results_sffl \
    --core-args "--collect-sqlite-coverage-database" \
    flapy_input_example_sffl.csv 10

2. Perform fault localization

Execute flapy.sh parse to generate the CTA (coverage table accumulated)

(this step only produces an output, if the test actual showed flaky behavior -> if needed, rerun the previous step)

./flapy.sh parse \
    ResultsDirCollection --path example_results_sffl \
    save_cta_tables \
        --cta_save_dir example_results_sffl_cta \
        --flaky_col "Flaky_sameOrder_withinIteration" \
        --method="accum"

Calculate Suspiciousness scores

./flapy.sh parse \
    CtaDir --path example_results_sffl_cta \
    calc_and_save_suspiciousness_tables \
        --save_dir example_results_sffl_cta_sus \
        --sfl_method sffl

3. Evaluate results

Merge with locations (-> EXAM scores & ranks)

./flapy.sh parse \
    SuspiciousnessDir --path example_results_sffl_cta_sus \
    merge_location_info \
        minimal_sffl_example/locations.csv \
        minimal_sffl_example/loc.csv \
    to_csv --index=False | vd --filetype=csv

(assumes visidata (vd) to be installed)

Contributing

Building FlaPy

Clone FlaPy:

git clone https://github.com/se2p/flapy
cd flapy

Building the container image:
We use containers to run the projects' test suites in an isolated environment.

docker build -t my_flapy -f Dockerfile .

This image can be used together with all existing scripts by changing the FLAPY_DOCKER_IMAGE variable in setup_docker_command.sh to localhost/my_flapy.

Building and running outside docker

Prerequisites

  • Python in at least version 3.8.
  • You have installed the latest version of poetry.
    • pip install poetry

Install FlaPy locally:

poetry install

Build FlaPy using the poetry tool:
This command will build two files in the dist folder: A tar.gz archive and a whl Python wheel file.

poetry build

TODOs

  • Use ordered sets or lists in output csv files to always get the same (string-equivalent) output
    • Many columns in passed_failed.csv are sets and their ordering is different from run to run

Contact

If you want to contact me, please find our contact details on my page at the University of Passau.

License

This project is licensed under the terms of the GNU Lesser General Public License.

About

A Tool for Mining Flaky Tests at Scale

License:GNU Lesser General Public License v3.0


Languages

Language:Python 88.7%Language:Shell 10.5%Language:Dockerfile 0.5%Language:Makefile 0.4%