Awesome-LLM-SoftwareTesting

A collection of papers and resources about the utilization of large language models (LLMs) in software testing.

Software testing is a critical task that is essential for ensuring the quality and reliability of software products. As software systems become increasingly complex, new and more effective software testing techniques are needed. Recently, large language models (LLMs) have emerged as a breakthrough technology in natural language processing and artificial intelligence. These models are capable of performing various coding-related tasks, including code generation and code recommendation. Therefore, the use of LLMs in software testing is expected to yield significant improvements. On one hand, software testing involves tasks such as unit test generation that require code understanding and generation. On the other hand, LLMs can generate diverse test inputs to ensure comprehensive coverage of the software being tested. In this repository, we present a comprehensive review of the utilization of LLMs in software testing. We have collected 102 relevant papers and conducted a thorough analysis from both software testing and LLMs perspectives, as summarized in Figure 1.

We hope this repository can help researchers and practitioners to get a better understanding of this emerging field. If this repository is helpful for you, please help us by citing this paper:

@article{Wang2023SoftwareTW,
  title={Software Testing with Large Language Model: Survey, Landscape, and Vision},
  author={Junjie Wang and Yuchao Huang and Chunyang Chen and Zhe Liu and Song Wang and Qing Wang},
  journal={ArXiv},
  year={2023},
  volume={abs/2307.07221},
  url={https://api.semanticscholar.org/CorpusID:259924919}
}

Table of Contents📇

Awesome-LLM-SoftwareTesting

News🎉

This project is under development. You can hit the STAR and WATCH to follow the updates.

Our LLM for mobile GUI testing paper: Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions is accepted by ICSE 2024. Note that, it is a follow-up work of Chatting with GPT-3 for Zero-Shot Human-Like Mobile Automated GUI Testing.
Our LLM for text input fuzzing paper: Testing the Limits: Unusual Text Inputs Generation for Mobile App Crash Detection with Large Language Model is accepted by ICSE 2024.
Our LLM for crash reproduction paper: CrashTranslator: Automatically Reproducing Mobile Application Crashes Directly from Stack Trace is accepted by ICSE 2024.
Our LLM for semantic text input generation paper: Fill in the Blank: Context-aware Automated Text Input Generation for Mobile GUI testing is published in ICSE 2023.
Our roadmap paper：Software Testing with Large Language Models: Survey, Landscape, and Vision is now public.

Overview🔭

From software testing perspective

We find that LLMs have proven to be efficient in the mid to late stages of the software testing lifecycle. During the mid-phase of software testing, LLMs have been successfully applied for various test case preparation tasks, including the generation of unit test cases, test oracle generation, and system test input generation. In later phases, such as the bug fix phase and the preparation of test reports/bug reports, LLMs have been utilized for tasks like bug analysis, debugging, and repair.

From LLM perspective

In our collected studies, the LLM most frequently employed is ChatGPT, widely recognized and popular for its exceptional performance across various tasks. The second most commonly used LLM is Codex, trained on an extensive code corpus, aiding researchers in coding-related tasks. Ranked third is CodeT5, an open-source LLM capable of conducting pre-training and fine-tuning with domain-specific data, thereby achieving better performance.

In our collected studies, 38 studies utilize the LLMs through pre-training or fine-tuning schema, while 64 studies employ the prompt engineering to communicate with LLMs to steer its behavior for desired outcomes without updating the model weights. Among them, 51 studies involve zero-shot learning, and 25 studies involve few-shot learning. There are also studies involving the chain-of-thought (7 studies), self-consistency (1 study), and automatic prompt (1 study).

In our collected studies, 67 of them utilize LLMs to address the entire testing task, while 35 studies incorporate additional techniques. These techniques include mutation testing, differential testing, syntactic checking, program analysis, statistical analysis, etc. .