Pointcept / Pointcept

Pointcept: a codebase for point cloud perception research. Latest works: PTv3 (CVPR'24 Oral), PPT (CVPR'24), OA-CNNs (CVPR'24), MSC (CVPR'23)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RandomShift or RandomRotate

Lizhinwafu opened this issue · comments

I noticed that the network uses some data augmentation techniques during training, such as point cloud rotation or translation. Can these data augmentation techniques improve the accuracy of model training?

Mostly, they can.

In my research, I manually rotated the raw point clouds along different axes. This allowed for manual data augmentation. I use PointNet++ model. My mIoU did improve indeed. However, after submission, the reviewer raised some questions, and I am unsure how to respond. Can you give some suggestions
“Although the authors described the data expansion method. However, for the PointNet family of models, the T-Net transformation matrix of the model is adjusted for the rigid transformation after the data are input to the model. This is a feature of the input point cloud and feature alignment module in PointNet. In addition, PointNet proposes symmetric functions to solve the point cloud disorder, which leads to the subsequent 3D recognition task. Overall, the symmetry function and the T-Net transform matrix adjust the input data to ensure effective training of the model. Therefore, I question the rotation as a data expansion method proposed in the article”

Oh, it is a question hard to answer. I am also unsure how to respond, but I have attached one version just for your reference:

Dear reviewer, thank you for pointing out such a crucial point about the nature of point cloud processing. We believe it is an important issue worth discussing and will be included in our final version. In a word, it is the symmetric functions that grant the model the potential to handle the unordered nature of the point cloud, and it is augmented data that grants the model the actual capacity to encode complex point clouds robustly by learning. Imagen that, when feeding a point cloud rotated with different angles to a PointNet++ without training with augmentation, even the point clouds are encoded with the symmetric local T-Net transformation matrix (also can be viewed as a kind of permutation invariance adaptive kernel, but of course not input invariance), as the input feature (coord) of each point is different, the output representation of each point should also be different. Yet we know that, fundamentally, these rotated point clouds should share a similar representation as the meaning of the given point cloud, as well as given points, shouldn't change after rotation. So, when including data augmented with rotation in the training process, the model will learn the ability from this prior information.

Hope my version of response can be helpful. Maybe the reviewer misunderstood that permutation invariance doesn't mean input invariance. Rotation will change the input feature of the input point cloud.

Haha, thank you so much. Although I do not agree with the reviewer's opinion, because pointnet++ does not seem to have a T-net structure.

Haha, thank you so much. Although I do not agree with the reviewer's opinion, because pointnet++ does not seem to have a T-net structure.

Haha, never say no to your reviewer. ╮(╯▽╰)╭