Fine-Tuning Llama-2-7b on Databricks-Dolly-15k Dataset and Evaluating with BigBench-Hard

In this project we will fine-tuning the Llama 2-7b-chat-hf model on the databricks-dolly-15k dataset using Google Colab.

Model and Dataset Configuration

Model: NousResearch/llama-2-7b-chat-hf
Dataset: databricks/dolly-15k

Dataset Overview

The databricks-dolly-15k dataset, hosted on Hugging Face, consists of over 15,000 records generated by Databricks employees across various behavioral categories. The dataset, available under the Creative Commons Attribution-ShareAlike 3.0 Unported License, is primarily intended for training large language models and synthetic data generation.

Training Parameters

LoRA attention dimension: 64
Alpha parameter for LoRA scaling: 16
Dropout probability for LoRA layers: 0.1
4-bit precision base model loading: True
Quantization type: nf4
Nested quantization: False
Training epochs: 1
Training batch size per GPU: 16
Evaluation batch size per GPU: 16
Optimizer: paged_adamw_32bit

Training Result:

Post-Training Analysis

The trained model was subjected to general questions and also questions from the BigBench-Hard dataset. While it demonstrated proficiency in handling general questions, there were instances where disparities emerged between the responses generated by the model and the anticipated answers, specifically during evaluations on BigBench-Hard questions.

Conclusion

The Llama 2-7b model was successfully fine-tuned on the databricks-dolly-15k dataset. Despite certain limitations in its responses, the fine-tuned model offers promising potential for integration into platforms such as LangChain, presenting an alternative to the OpenAI API. Further work is necessary to refine the model's accuracy.

DrishtiShrrrma / llama-2-7b-chat-hf-dolly-15k-w-bigbench-hard-evaluation