microsoft / esvit

EsViT: Efficient self-supervised Vision Transformers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[QUESTION] Results on correspondence learning

tileb1 opened this issue · comments

Hello,
I cannot seem to find in the paper which features are used for doing the correspondence matching in the appendix. Is it the last layer features (rough-grained) or the first layer features (fine-grained) or a combination of features at all depths (if so how is the combination?) ?
Thanks!

Only the last layer feature.