Llama-2-7b Customer Support Agent Fine-tuning

This repository contains scripts and instructions for fine-tuning the Llama-2-7b model on the Kaludi Customer-Support-Responses dataset. The goal is to create an automated customer support agent capable of generating relevant and coherent responses to customer queries.

Quick Start with Google Drive and Colab

For the easiest and quickest way to test the fine-tuned model, please use the provided Google Drive link. This link contains all necessary files and Colab notebooks. Simply run the run_agent.ipynb file in Google Colab, which will handle all the required installations and configurations for you.

Google Drive Link

By using the Colab notebook, you can avoid manual installation of dependencies and start testing the fine-tuned model immediately.

Introduction
Dataset
Model
Training
Evaluation
Usage
Requirements
Results
Contributing
License
Additional Resources

Introduction

This project fine-tunes the Llama-2-7b model using the Kaludi Customer-Support-Responses dataset. The objective is to train a model that can generate appropriate customer support responses.

Dataset

The dataset used for training consists of customer queries and corresponding support responses. It can be loaded from Hugging Face:

from datasets import load_dataset
dataset = load_dataset("Kaludi/Customer-Support-Responses")

Model

We use the NousResearch/Llama-2-7b-chat-hf as the base model and fine-tune it using LoRA (Low-Rank Adaptation) techniques to efficiently adjust the model weights.

Training

The training script train.py performs the following steps:

Load the dataset.
Format the data for Llama-2.
Set up the model and tokenizer.
Configure training parameters.
Fine-tune the model using the SFTTrainer from the trl library.

To train the model, run:

python train.py

Training Parameters

LoRA parameters:
- lora_r = 64: The rank of the LoRA matrices. Higher values may capture more information but require more computation.
- lora_alpha = 16: Scaling factor for the LoRA layers. Adjusts the importance of the LoRA modifications.
- lora_dropout = 0.1: Dropout rate for the LoRA layers. Helps prevent overfitting by randomly dropping units during training.
TrainingArguments parameters:
- num_train_epochs = 5: Number of times the model will iterate over the entire training dataset.
- per_device_train_batch_size = 4: Number of samples processed before updating the model's parameters.
- gradient_accumulation_steps = 1: Number of steps to accumulate gradients before performing a backpropagation pass.
- learning_rate = 2e-4: Step size at each iteration while moving towards a minimum of the loss function.
- weight_decay = 0.001: Regularization technique to reduce overfitting by penalizing large weights.
- lr_scheduler_type = "cosine": Learning rate schedule. Cosine decay can help the model converge more smoothly.

The model was trained on Google Colab using a T4 GPU. For better performance, more powerful GPUs such as V100 or A100 could be used, and the batch size could be increased. Gradient checkpointing can be disabled on machines with more memory to speed up training.

Evaluation

Evaluation is performed using evaluate_LLM.ipynb, which calculates the perplexity score of the fine-tuned model:

perplexity = calculate_perplexity(model, tokenizer, validation_texts)
print(f'Perplexity: {perplexity}')

Evaluation Metrics

Perplexity: Measures how well the model predicts the next token in a sequence. Lower perplexity indicates better performance.

Download the Fine-tuned Model

You can download the fine-tuned model directly from the Colab link provided in the Google Drive. This ensures that you have all the necessary files to run the model locally if preferred.

Usage

You can interact with the fine-tuned model using run_agent.py. This script allows you to input queries and get responses from the model in a loop:

python run_agent.py

or you can run the run_agent.ipynb file in Colab or Jupyter environment - tested.

Requirements

Install the required packages using requirements.txt:

pip install -r requirements.txt

List of Dependencies

accelerate==0.21.0
peft==0.4.0
bitsandbytes==0.40.2
transformers==4.31.0
trl==0.4.7
torch==2.3.0+cu121

Results

After fine-tuning, the model achieves a perplexity score of less than 100 on the sample validation set, indicating good performance in generating coherent and relevant responses.

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Additional Resources

For more details and to access the trained model and related files, please visit the Google Drive link.

stray128 / pesto_assessment