Special tokens

Question

Special tokens

KooSung opened this issue 6 months ago · comments

Nice work! Why didn't all-see-v2 add <ref> etc. to the special tokens?

WeiyunWang · Answer 1 · Mon Mar 11 2024 22:57:15 GMT+0800 (China Standard Time)

Thank you for your interest in our project.

Early experimental results indicate that adding special tokens, such as <ref>, <box>, and <rel> has only a minor impact on performance. Therefore, to maintain simplicity, we have decided not to add any special tokens.

Ko Sung · Answer 2 · Tue Mar 12 2024 11:20:44 GMT+0800 (China Standard Time)

@Weiyun1025 Thanks. Another question, during the training of the regular detection model, it is necessary to adjust the bbox based on the image preprocessing, but why is it only necessary to normalize the bbox to 0-1000 （or with square_pad) during LLM training? Qwen-VL also does this, but the reason is not explained.

WeiyunWang · Answer 3 · Wed Mar 13 2024 20:56:31 GMT+0800 (China Standard Time)

Adjusting the bboxes is necessary when data augmentation is utilized. However, we do not use any data augmentation except for image flipping, for which we preprocess the bboxes offline.

For the second question, since the input size of ASMv2 is only 336x336, a scale of 1000 is large enough. If the input size were scaled up to 2000x2000, it might be necessary to enlarge the scale.