Enhancing Traceability in Software Engineering: A Fine-Tuned Language Model Approach 🚀

(Kindly visit the folders for specific README files)

Authors 📝

John Melwin Richard
- 🎓 Master of Science, Data Science
- 🏫 Rochester Institute of Technology
- 📧 Email: jj5603@rit.edu
Zhe Yu
- 🎓 Assistant Professor of Software Engineering
- 🏫 Rochester Institute of Technology
- 📧 Email: zxyvse@rit.edu

Abstract 📝

This research introduces a novel approach to enhance traceability in the software development lifecycle using advanced NLP techniques, specifically leveraging large language models (LLMs) 🤖. Our study assesses the effectiveness of this innovative method compared to existing techniques and explores the transition from traditional to transformer-based traceability.

Introduction 📘

Software traceability involves establishing and managing links between code entities throughout the development lifecycle. This research utilizes large language models (LLMs) to automate the identification of trace links, aiming to improve the efficiency and accuracy of the software development process.

Key Questions 🤔

Can we enhance traceability using LLMs?
How does this approach compare with existing techniques?
How can developers transition to LLM-based traceability effectively?

Dataset 📊

We utilized a traceability dataset specifically for the Gantt system, focusing on methods, variables, interfaces, and classes linked to software requirements.

Approach 🔍

Our methodology includes:

Data Preprocessing: Cleaning, partitioning, and sampling the dataset.
Training: Utilizing models like CodeBERT and GPT-3.5, trained on NVIDIA A100 GPUs.
Model Monitoring and Adjustment: Real-time adjustments based on performance metrics.
Testing and Evaluation: Testing models using distinct datasets to ensure unbiased evaluations.

Results and Discussion 📈

The evaluation shows that GPT-3.5 models, especially when fine-tuned, outperform traditional models like CodeBERT in all metrics, demonstrating the potential of LLMs in enhancing traceability.

Conclusion 🎯

LLMs significantly improve the process of establishing trace links between software documentation and code, surpassing traditional methods. Future research should explore the broader applications of LLMs in software engineering, ensuring ethical and privacy considerations are managed.

Feel free to star ⭐ and fork 🍴 this repository if you find it useful in your research or software development projects!

johnmelwin / ResearchProject1