ViViT for Regression Tasks

Question

ViViT for Regression Tasks

Taimoor-R opened this issue a year ago · comments

I have been trying to use TimeSformer and ViViT, I have managed to convert it into a regression model by changing the loss function and setting the output of the mlp to 1. However what i understand is that a video vision transformer takes a video clip as an input(broken into frames) and outputs a single value corresponding to that input clip. I would like the model to output a value for each frame of the clip input so instead of outputing 1 value it outputs 32 values. Can you guide me in this regards.

BitCalSaul · Answer 1 · Sat Dec 23 2023 11:01:43 GMT+0800 (China Standard Time)

Hi @Taimoor-R I also have interest in developing a model that performs this function and am also in the process of figuring out how to adjust the model to predict values for each pixel (or said regression). Have you found a solution in this direction? Thanks for any hint.