TimHanewich / OpenAI-GPT-Fine-Tuning

An example of fine tuning a GPT model on the Gilligan's Island script and personal text message logs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Training Data

I've prepared training data in the format that is required by OpenAI/Azure OpenAI (.jsonl files). You can download them from the table below:

Description Size Download
Every SpongeBob SquarePants episode, 10,000 character prompt limit 448 MB here
The same file as above, but trimmed the bottom portion of it to limit the file size 283 MB here
Same as above, but trimmed bottom portion to limit file size 145 MB here
Same as above, but trimmed bottom portion to limit to 15,000 lines 95 MB here

Gilligan's Island

Gilligans Island Scripts: http://www.gilligansisle.com/scripts.html

Needs work:

Structure for Text Messages Data Container

Folder structure:

- conversations
    - 9418473894.json
    - 9417773842.json

Each JSON file in the conversations folder contains the conversation like this:

[
    {
        "speaker": 0,
        "body": "Hi, Tim, how are you?"
    },
    {
        "speaker": 1,
        "body": "I am good, thank you. How are you?"
    }
]
  • speaker 0 is the person speaking with Tim, speaker 1 is Tim's response.

Spongebob Scripts

  • Scripts can be found here.

About

An example of fine tuning a GPT model on the Gilligan's Island script and personal text message logs


Languages

Language:C# 100.0%