MTL-ViT: A new multi-task learning framework using Vision Transformers

(*Note: This is an ongoing project, hence the full code and strategy is not yet open-sourced by the author.)

We presnet a new multi-task learning strategy using Vision transformers (ViTs). Our approach is based on exploiting the class-token and self-attention mechanism of Vision Transformers in order to train multiple tasks through a single ViT, more efficiently and with limited computational budget.

Total Loss of the Multi-task system: $L_{total}=L_{1}+L_{1}+L_{3}+ . . . + L_{n}$

About

A new multi-task learning framework using Vision Transformers

Languages

Language:Python 100.0%