Efficiency of the adapter

Question

Efficiency of the adapter

ariharasudhanm opened this issue 2 months ago · comments

Ariharasudhan Muthusami commented 2 months ago

If am not wrong the proposed adapter contains 183M parameters when you compare this with the VIT-B encoder which is composed of 63M params approximately. How can you claim that your adapter is efficient than fine tuning the whole encoder itself?

Junlong · Answer 1 · Thu May 23 2024 01:43:01 GMT+0800 (China Standard Time)

Firstly, fine-tuning the entire encoder would lead to a degradation of the original ViT's capabilities, so we opted for adapter fine-tuning instead. Secondly, the efficiency during fine-tuning with adapters did not decrease to an intolerable level; for instance, the FPS remained acceptable. Lastly, the adapter layer updates parameters only during the first iteration of each batch, and subsequent iterations do not update them, thus maintaining training efficiency. If you wish to reduce the number of parameters further, you can increase the down-sampling rate, such as to 0.75.