Pointcept / Pointcept

Pointcept: a codebase for point cloud perception research. Latest works: PTv3 (CVPR'24 Oral), PPT (CVPR'24), OA-CNNs (CVPR'24), MSC (CVPR'23)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about PTv3 Time and Memory Complexity

yxchng opened this issue · comments

Screenshot from 2024-03-28 15-40-41

Based on this table, it seems like PTv3 has a constant time and memory usage regardless of patch size. However, I can't find an explanation in the paper why this is the case. Can you kindly elaborate? What leads to constant time and memory complexity of PTv3?

Hi, the constant time and memory, regardless of patch size, sourced from FlashAttention, which fully utilized the L1 and L2 cache of GPU to operate attention. More details are available in their paper. Noted that PTv3 is also efficient without FlashAttnetion, yet without it, we can not scale up patch size with a constant time and memory cost.

Screenshot from 2024-03-28 15-57-46

The metrics given in their repo does not really show constant time and memory complexity, but increases when sequence length increases. Why PTv3 does not exhibit same characteristic?

The meaning of sequence length in NLP means token number feed to network each forward. It is close to the concept of numbers of points in 3D point clouds. It would be great if you could run our code and ablate the parameter by yourself.