Lightning-AI / litgpt

Pretrain, finetune, deploy 20+ LLMs on your own data. Uses state-of-the-art techniques: flash attention, FSDP, 4-bit, LoRA, and more.

Home Page:https://lightning.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

prompt_style

fireyanci opened this issue · comments

I don't want to use the dataset styles listed in prompt styles: Dict. I want to use my own defined dataset style. How can I build my own dataset style to use finetune/ora,my datasets style is
{
"conversation": [
{
"system": "This is like an instruction",
"input": "",
"output": ""
}]
}

because i want use Multi round conversation data

I think the easiest way here would be to use on of the existing datasets as templates. I remember that deita had multi turn questions in the dataset, so I added this as an option. Maybe this is helpful as a template for building your own datset:

include_multiturn_conversations: bool = False

But note that LitGTP otherwise doesn't do anything special for multi turn. It basically treat the data multiturn example as another regular input example during training.

Thank you very much for your reply,I've read your explanation about Dora, it's excellent. Thank you.I hope to use it in the LitGPT project.

Glad to hear you found it useful! There are currently so many todos, but yeah, adding DoRA to LitGPT some time would be great.