In jp.jsonl, all the question is "null", is there any mistake?
Labmem009 opened this issue · comments
Labmem009 commented
Some lines are "T2T" type, but still have no question, is there any mistake?
xxx commented
Labmem009 commented
xxx commented
This is because Japanese data mainly comes from Bokete, where T2T
is displayed in image form.
This type of data can also be labeled as T2T
because, as shown in the following image (chat with Qwen-VL), well-formatted text within images can be easily recognized and understood by multimodal large language models.
You also can find data where the question is not 'None' in the Chinese and English data.