Question about image transformation: short edge is still 384 for the fine-tuning task?

Question

Question about image transformation: short edge is still 384 for the fine-tuning task?

Jxu-Thu opened this issue 3 years ago · comments

Thanks for your great codes!
I carefully read your paper.

(in your paper) We resize the shorter edge of input images to 384 and limit the longer edge to under 640 while preserving the aspect ratio. This resizing scheme is also used during object detection in other VLP models, but with a larger size of the shorter edge (800). Patch projection of ViLT-B/32 yields 12 × 20 = 240 patches for an image with a resolution of 384*640.

However, I find that the "image_size=384" for all downstream tasks in this codes?

Would it have an effect on the performance of downstream tasks? At least with a shorter edge 800 can greatly increase the length of the sequence. So It should have a smaller batch size when using "shorter edge 800"

Jxu-Thu commented 3 years ago

Thanks.

Wonjae Kim · Answer 1 · Fri Jul 09 2021 10:33:25 GMT+0800 (China Standard Time)

We do use the shorter size of 384 for downstream tasks.
def config() is the default configuration, and the values in the configuration are used as-is unless named configs or command-line modifications do not modify them.

You can check the final configuration of an execution by print_config option.