Positional Encoding for RGB

Question

Positional Encoding for RGB

greatbaozi001 opened this issue a year ago · comments

Hi. I have read the paper and there is a question remains for me.
As the paper mentions, a 2D encoder is adopted to extract feature map f ∈ R^(64x256x256), and positional encoding is performed to the RGB values and the code is append to 2D feature maps to form f ∈ R^(96x256x256). How can I map RGB ∈ 3 to (96-64) with positional encoding?

Shoukang Hu · Answer 1 · Fri Jun 09 2023 10:10:11 GMT+0800 (China Standard Time)

Hi, thanks for your interest in our work. We use use positional encoding with the number of frequencies 5 to map RGB ∈ R^(3x256x256) to R^(33x256x256). Then we append the first 32 dimensions of RGB ∈ R^(32x256x256) to feature map f ∈ ∈ R^(64x256x256), which finally forms f ∈ R^(96x256x256).

Lin Yihong · Answer 2 · Fri Jun 09 2023 10:19:16 GMT+0800 (China Standard Time)

thanks, the answer is clear!

markkim1115 · Answer 3 · Wed Jun 14 2023 16:57:36 GMT+0800 (China Standard Time)

Hi, Sorry for the reopening the issue. Is there any reason of design to pick the first 32 dimension of encoded RGB? Thanks!

Shoukang Hu · Answer 4 · Fri Jun 16 2023 22:24:50 GMT+0800 (China Standard Time)

Hi, we mainly hope to keep the dimension of 1D global, 2d pixel-aligned and 3d point features same so that it would be easier for later feature processing and fusion stage.