Fine tune OpenAI model using custome dataset

Overview

This project allows you to fine-tune a chat system using a custom dataset. Follow the instructions below to prepare your dataset, configure your fine-tuning specifications, install necessary dependencies, and initiate the training process.

Preparing Your Dataset

Dataset Format: Your dataset should be in a .csv format with two columns: human and assistant.

The human column should contain questions or phrases from the human user.
The assistant column should contain responses from the chatbot.

Example Dataset:

human	assistant
How's the weather?	It's sunny and warm.

Configuration

Define your fine-tuning specifications in the fine_tune_specification.toml file. Please be aware that using a large batch size may result in significant delays in the execution queue. This includes:

data_path: Path to your dataset file.
system_message: A system message in the tone you desire for your chat system.
training_parameters: Any specific training parameters you wish to adjust.
model_specification: Specify the OpenAI model you plan to use for fine-tuning.

Installation

Install the necessary dependencies with the following command:

pip install -r requirements.txt

Running the Training Process

Start the training process by executing:

python run.py --config_file_path ${config_file_path}

After initiating the training process, the response ID will be added to your original config file under the name "response id". This ID is crucial for monitoring the training and validation loss as your model fine-tunes. Please note that each time you run the script, a training job will be submitted to OpenAI, unless a response ID already exists in your config file, in this case, with command:

python run.py --config_file_path ${config_file_path}

you can monitor the training and validation loss directly in the terminal.

Troubleshooting

Problem

When attempting to run the application, you might encounter the following error:

openai.BadRequestError: Error code: 400 - {'error': {'message': 'invalid n_epochs: 60', 'type': 'invalid_request_error', 'param': 'n_epochs', 'code': None}}

Solution

To resolve this error, ensure that the n_epochs parameter is within the valid range supported by the API or model. E.g, reduce the number of epochs.

Problem

Jobs submitted remain in the queue for an extended period without being executed.

Solution

Consider reducing the batch size.

lalashiwoya / OpenAI_Fintune_Free_Text_Supervised

Fine tune OpenAI model using custome dataset

Overview

Preparing Your Dataset

Configuration

Installation

Running the Training Process

Troubleshooting

Problem

Solution

Problem

Solution

About

Languages