Hot to get the features and positional embeddig information

Question

Hot to get the features and positional embeddig information

marcomameli1992 opened this issue 2 years ago · comments

Dear,
I would use your package but not for classification I need it to extract information from images and get these as output and in addition to that I need to get the positional embedding information to reconstruct the features images.

Thank you so much.

Phil Wang · Answer 1 · Sun Dec 19 2021 02:15:21 GMT+0800 (China Standard Time)

@marcomameli1992 hi Marco, you just need to modify ViT to have a return statement here https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/vit.py#L123 for the embeddings

i guess i could add this, but i don't want to cloud how simple and clear the code is atm

Phil Wang · Answer 2 · Sun Dec 19 2021 02:16:15 GMT+0800 (China Standard Time)

@marcomameli1992 what do you mean by the positional embedding? the absolute positional embeddings are added at the beginning before it is fed through the attention layers, and can be accessed as v.pos_embedding

Phil Wang · Answer 3 · Sun Dec 19 2021 02:17:31 GMT+0800 (China Standard Time)

@marcomameli1992 actually, let me just write up a layer extractor that can wrap the ViT and return all these intermediates, similar to https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/recorder.py

Phil Wang · Answer 4 · Wed Dec 22 2021 03:12:35 GMT+0800 (China Standard Time)

@marcomameli1992 does this work for you? https://github.com/lucidrains/vit-pytorch/tree/0.25.1#accessing-embeddings