zhigangjiang / LGT-Net

This is PyTorch implementation of our paper "LGT-Net: Indoor Panoramic Room Layout Estimation with Geometry-Aware Transformer Network".(CVPR'22)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature sequences

kimkj38 opened this issue · comments

Hello, I'm confused about token size because it's square(8x8 or 16x16) on ViT.

As I understand, feature sequence means 256 tokens with 1024 dimension vector.
And $N/N_w(=256/16)$ window feature sequences of Window Block means that there're 16 partitions with 16 sequences per partition.

스크린샷 2022-07-04 오전 10 03 43

Then 4 partitions of this figure is just for simplification? Or did I get it wrong?

Yes, for simplicity, this figure shows the case of $N_w=N/4$.

@zhigangjiang Thanks for your reply!