aipixel / GPS-Gaussian

[CVPR 2024 Highlight] The official repo for “GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis”

Home Page:https://shunyuanzheng.github.io/GPS-Gaussian

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Questions regarding the performance v.s. conventional point cloud

ysjue opened this issue · comments

Dear authors, thanks for releasing this excellent work!
Can I ask a naive question for this line of research on generalizable Gaussian: what are the superiorities of generalizable Gaussian over the traditional point cloud? As generalizable Gaussian often suggests a pixel-aligned Gaussian point, the scales assigned to each Gaussian primitive should approach 0 (even many are 0 when running the code), which means the Gaussian primitives may degenerate to a point cloud.
Also, according to Fig. 5 in the supplementary, you compared the rendered view with the point cloud reprojection and showed that GS can perform well with the presence of depth noises by learning the opacity. But when the depth estimation has some voids (i.e., second row in Fig. 5 (c/d) in your supplementary), how can your model complete this missing 3D information and recover the real placed legs given a small scale upper-bound of 0.01 set in your code?
Could you advise me on these questions? Thank you!

Hi, thanks for your attention!

First, the superiorities of generalizable Gaussian over the traditional point cloud. I think the most important properties of 3D Gaussians for GPS-Gaussian are position, scale, and opacity. Different from previous point cloud rendering, the position of 3D Gaussian is differentiable, which enables the overall framework end-to-end differentiable. The adaptive scale is useful to handle the 'holes' caused by point rendering (refer to Fig.4 in supplementary of Pointavatar). The opacity effectively eliminates the noisy points as shown in Fig.5.
The scales assigned to each Gaussian primitive approach 0 is not always the case. Though the scales are much smaller than the original 3DGS since it is pixel-aligned, the Gaussian points have different scales as shown in Fig.6 of supplementary. They are affected by comprehensive factors. Also, an upper-bound of 0.01 is not a small scale in our setup, you can manually set a Gaussian's scale to 0.01 to visualize the rendered region, which will be similar to a fist.
The second row in Fig. 5 (c/d) in supplementary is un-projected from the right view. The seeming void regions in Fig. 5 are complete in the un-projection of left view Gaussian maps. However, if the 3D structure is invisible in both views, our model can only complement them by generating larger-scale Gaussian for the surrounding primitives rather than complement the 3D representation. A typical example is the severe self-occlusion. One feasible solution will be reported in our coming SIGGRAPH paper if you are interested.

If you have any further questions, please feel free to contact me.

Thank you so much for this feedback! And good to know your new pub. Congrats!

Hey there, any update on the whereabouts of the SIGGRAPH paper? :) Is there going to be a preprint available already? :)

Hey there, any update on the whereabouts of the SIGGRAPH paper? :) Is there going to be a preprint available already? :)

The project page and arXiv paper will be released next week, sorry for the delayed update.