TimHanewich/OpenAI-GPT-Fine-Tuning

Training Data

I've prepared training data in the format that is required by OpenAI/Azure OpenAI (.jsonl files). You can download them from the table below:

Description	Size	Download
Every SpongeBob SquarePants episode, 10,000 character prompt limit	448 MB	here
The same file as above, but trimmed the bottom portion of it to limit the file size	283 MB	here
Same as above, but trimmed bottom portion to limit file size	145 MB	here
Same as above, but trimmed bottom portion to limit to 15,000 lines	95 MB	here

Gilligan's Island

Gilligans Island Scripts: http://www.gilligansisle.com/scripts.html

Needs work:

Structure for Text Messages Data Container

Folder structure:

- conversations
    - 9418473894.json
    - 9417773842.json

Each JSON file in the conversations folder contains the conversation like this:

[
    {
        "speaker": 0,
        "body": "Hi, Tim, how are you?"
    },
    {
        "speaker": 1,
        "body": "I am good, thank you. How are you?"
    }
]

speaker 0 is the person speaking with Tim, speaker 1 is Tim's response.

Spongebob Scripts

Scripts can be found here.

About

An example of fine tuning a GPT model on the Gilligan's Island script and personal text message logs

Languages

Language:C# 100.0%