QwenLM / Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

请问微调中使用本地数据时配置json格式的文件中的id参数每个样本可以设置为一样的吗?

whysirier opened this issue · comments

起始日期 | Start Date

No response

实现PR | Implementation PR

No response

相关Issues | Reference Issues

No response

摘要 | Summary

请问微调中使用本地数据时配置json格式的文件中的id参数每个样本可以设置为一样的吗?

基本示例 | Basic Example

1)id一样:
[
{
"id": "identity_0",
"conversations": [
{
"from": "user",
"value": "你好"
},
{
"from": "assistant",
"value": "我是VL。"
}
]
},
{
"id": "identity_0",
"conversations": [
{
"from": "user",
"value": "Picture 1: /mnt/data/images/cat217.jpg\n图中有几只猫?"
}
]
}
]
2)id不一样
[
{
"id": "identity_0",
"conversations": [
{
"from": "user",
"value": "你好"
},
{
"from": "assistant",
"value": "我是VL。"
}
]
},
{
"id": "identity_1",
"conversations": [
{
"from": "user",
"value": "Picture 1: /mnt/data/images/cat217.jpg\n图中有几只猫?"
}
]
}
]

缺陷 | Drawbacks

两种方式对训练效果有影响吗

未解决问题 | Unresolved questions

No response

应该不影响,之前考虑过这个问题,查找源代码的过程中并未发现该id被使用