Lightning-AI / litgpt

I don't want to use the dataset styles listed in prompt styles: Dict. I want to use my own defined dataset style. How can I build my own dataset style to use finetune/ora，my datasets style is
{
"conversation": [
{
"system": "This is like an instruction",
"input": "",
"output": ""
}]
}

because i want use Multi round conversation data

I think the easiest way here would be to use on of the existing datasets as templates. I remember that deita had multi turn questions in the dataset, so I added this as an option. Maybe this is helpful as a template for building your own datset:

litgpt/litgpt/data/deita.py

Line 29 in cbbe9cd

include_multiturn_conversations: bool = False

But note that LitGTP otherwise doesn't do anything special for multi turn. It basically treat the data multiturn example as another regular input example during training.

Thank you very much for your reply，I've read your explanation about Dora, it's excellent. Thank you.I hope to use it in the LitGPT project.

Glad to hear you found it useful! There are currently so many todos, but yeah, adding DoRA to LitGPT some time would be great.

prompt_style