johnmelwin / ResearchProject1

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Enhancing Traceability in Software Engineering: A Fine-Tuned Language Model Approach πŸš€

(Kindly visit the folders for specific README files)

Authors πŸ“

  • John Melwin Richard
    • πŸŽ“ Master of Science, Data Science
    • 🏫 Rochester Institute of Technology
    • πŸ“§ Email: jj5603@rit.edu
  • Zhe Yu
    • πŸŽ“ Assistant Professor of Software Engineering
    • 🏫 Rochester Institute of Technology
    • πŸ“§ Email: zxyvse@rit.edu

Abstract πŸ“

This research introduces a novel approach to enhance traceability in the software development lifecycle using advanced NLP techniques, specifically leveraging large language models (LLMs) πŸ€–. Our study assesses the effectiveness of this innovative method compared to existing techniques and explores the transition from traditional to transformer-based traceability.

Introduction πŸ“˜

Software traceability involves establishing and managing links between code entities throughout the development lifecycle. This research utilizes large language models (LLMs) to automate the identification of trace links, aiming to improve the efficiency and accuracy of the software development process.

Key Questions πŸ€”

  1. Can we enhance traceability using LLMs?
  2. How does this approach compare with existing techniques?
  3. How can developers transition to LLM-based traceability effectively?

Dataset πŸ“Š

We utilized a traceability dataset specifically for the Gantt system, focusing on methods, variables, interfaces, and classes linked to software requirements.

Approach πŸ”

Our methodology includes:

  • Data Preprocessing: Cleaning, partitioning, and sampling the dataset.
  • Training: Utilizing models like CodeBERT and GPT-3.5, trained on NVIDIA A100 GPUs.
  • Model Monitoring and Adjustment: Real-time adjustments based on performance metrics.
  • Testing and Evaluation: Testing models using distinct datasets to ensure unbiased evaluations.

Results and Discussion πŸ“ˆ

The evaluation shows that GPT-3.5 models, especially when fine-tuned, outperform traditional models like CodeBERT in all metrics, demonstrating the potential of LLMs in enhancing traceability.

ROC

Conclusion 🎯

LLMs significantly improve the process of establishing trace links between software documentation and code, surpassing traditional methods. Future research should explore the broader applications of LLMs in software engineering, ensuring ethical and privacy considerations are managed.

Feel free to star ⭐ and fork 🍴 this repository if you find it useful in your research or software development projects!

About


Languages

Language:Jupyter Notebook 99.0%Language:Python 1.0%