Some questions about the hyperparameters

Question

Some questions about the hyperparameters

qsisi opened this issue 2 years ago · comments

qsisi commented 2 years ago

Hello! Here are some questions for the code part on dataset 3DMatch.

The dimensionality of the output from the KPConv backbone is set to 528, theoretically, any number that could be divided by 6 and 4 with no remainder is feasible here? Because in my understanding, divided by 6 is for the rotary positional embedding, and divided by 4 is for the multi-head attention, so why not choose 516 as the output dimensionality? Is there any reason behind this choice?
The code part of rotary positional embedding you implemented is impressive, I noticed that you first voxelize the raw 3d coordinates (but not flooring the output coordinates) to scale the 3d coordinates, then I believe the vol_bnds = [-3.6, -2.4, 1.14] is the minimal coordinates among all 3DMatch train&val&test set? also, does voxelizing(or called scaling) the coordinates before positional embedding could get better results than just utilizing the raw 3d coordinates for positional embedding? Could you give some hints about it?

Thank you very much for your help.

Yang Li · Answer 1 · Sun Jun 05 2022 09:23:45 GMT+0800 (China Standard Time)

Hi.

The reason is that we think the position code with the same frequency should fall into the same head of the transformer, therefore the feature needs to be divided by 24 (4*6).
The voxel size controls the starting frequencies of the position code. Lower freq. lead to smoother signals that can reflect long-range distance, and higher freq. lead to fluctuating signals that reflect short-range distance.
Using the raw coordinate lead to freq. that is too high. Fig. 2 in this paper is intuitive https://arxiv.org/pdf/2104.06405.pdf

qsisi · Answer 2 · Sun Jun 05 2022 09:34:49 GMT+0800 (China Standard Time)

Thank you for your prompt reply.

Sorry to check again, you mean, by applying voxelization, the voxelized coordinates could lead to a higher freq, resulting smoother signals, right?

Also, does the vol_bnds = [-3.6, -2.4, 1.14] denotes the minimal coordinates among all 3DMatch train&val&test set?

Thanks.

Yang Li · Answer 3 · Sun Jun 05 2022 09:44:56 GMT+0800 (China Standard Time)

Sorry, I wrote in the wrong way. Voxelization with 0.04 m lead to Lower freq. and smoother signals.

Yes it is the min coordinate.
The vol_bnd means to cancel global translation. This is not neccesarry for rotary positional encoding as it always relveals relative distance. But could affect absolute encoding such as sinusodial.

qsisi · Answer 4 · Sun Jun 05 2022 09:52:51 GMT+0800 (China Standard Time)

Thanks.

Now I get your point, by voxelizing, the raw coordinates such as [0.3900, 0.9669, 0.7839] are scaled to [0.3900, 0.9669, 0.7839] / 0.08 = [ 4.875 , 12.08625, 9.79875], which has lower freq, leading to smoother signals.

Also, the voxel_size setting for 3DMatch seems to be 0.08m instead of 0.04m?

qsisi · Answer 5 · Tue Jun 28 2022 15:05:33 GMT+0800 (China Standard Time)

Sorry to bother you again, may I ask how to get the exact vol_bnds = [-3.6, -2.4, 1.14] for 3DMatch? because when I iterate through the train&val&test set of 3DMatch(provided by PREDATOR), the min coordinates of them turn out to be [-1.5, -1.5, 0.5], could you give some hints about it?

bitm · Answer 6 · Tue Jun 28 2022 21:53:46 GMT+0800 (China Standard Time)

I remember [-3.6, -2.4, 1.14] was from 4DMatch.
Our positional encoding is relative, i.e. change the starting point does not affect the position encoding.
Therefore, the vol_bnds is not a crutial prams, you can use any number for it.

qsisi · Answer 7 · Tue Jun 28 2022 22:00:10 GMT+0800 (China Standard Time)

Thanks for your reply, indeed those boundaries are not crucial in relative positional encoding, I'm currently trying the sparse convolution library which needs min coordinates over all input point clouds, so the min coordinates calculated by myself do not consistent with that in lepard confuses me, not it make senses, anyway, thanks for your reply.