asgaardlab/21-markos-test_case_similarity_technique-code

nlp text-embedding text-similarity clustering software-testing game-testing machine-learning

Identifying Similar Test Cases That Are Specified in Natural Language

This repository contains the source code of our technique and related experiments to identify similar test cases written in natural language. The technique first clusters test steps which are semantically similar and then uses those clusters to identify similar test cases.

To cluster similar test steps, we performed several experiments with the following text embedding techniques, text similarity metrics, and clustering algorithms:

Text embedding techniques

Text similarity metrics

Word Mover’s Distance (WMD)
Cosine score

Clustering algorithms

Hierarchical Agglomerative Clustering
K-means

To find similar test cases, we used the identified clusters of similar test steps to build and evaluate four different techniques.

Structure of directories

The following directories contains the source code of all the approaches that were part of our experiments.

test-step-clustering: contains the notebooks with the source code for our test step clustering experiments.
test-case-similarity: contains the notebooks with the source code for our test case similarity experiments.
evaluations: contains the notebooks with the source code to evaluate all the approaches for test step clustering and techniques for test case similarity.

Dependencies

The following dependencies are required to run the notebooks on your local machine:

Python 3.7
Numpy 1.19

pip install numpy
Pandas 1.1.5

pip install pandas
matplotlib 3.0.3

pip install matplotlib
scikit-learn 0.21.1

pip install scikit-learn
Gensim 3.8.3

pip install gensim
NLTK 3.4.1

pip install nltk
Torch 1.7.1+cpu

pip install torch
Transformers 4.3.2

pip install transformers
Sentence Transformers 0.4.1

pip install sentence-transformers
TensorFlow 2.4.1

pip install tensorflow
TensorFlow_Hub 0.11.0

pip install tensorflow-hub

About

Repository with the source code of our technique to analyze a test suite and find similar test cases written in natural language

nlp text-embedding text-similarity clustering software-testing game-testing machine-learning

Languages

Language:Jupyter Notebook 100.0%