aipixel / GPS-Gaussian

[CVPR 2024 Highlight] The official repo for “GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis”

Home Page:https://shunyuanzheng.github.io/GPS-Gaussian

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to achieve 25fps inference speed

initialneil opened this issue · comments

Great work!

Here's some profiling on the inference speed on a 3090.

[CUDA Timer] raft_stereo takes 26.7794 ms
[CUDA Timer] flow2gsparms takes 80.8899 ms
[CUDA Timer] .... flow2gsparms/gs_parm_regresser takes 77.3901 ms
[CUDA Timer] render takes 4.7777 ms

With the given testing real images, the gs_parm_regresser alone takes 77ms, not to mention other parts like the raft_stereo. Could you please give some suggestions on speeding up?
How was the 25fps claimed in the paper achieved?

The depth estimator and gs parameter regresser are implemented using TensorRT with fp16. Also, Robust Video Matting in TensorRT is needed in real-world application. However, the acceleration implementation in C++ will not include in this project.

Thanks for your great work! Do the gs parameter regresser module models need to be retrained in python using fp16 mode? When using official pretrained models directly, there seems to be numerical overflow issues during the inference process in C++, which leads to incorrect inferences of attributes such as opacity.

Thanks for your great work! Do the gs parameter regresser module models need to be retrained in python using fp16 mode? When using official pretrained models directly, there seems to be numerical overflow issues during the inference process in C++, which leads to incorrect inferences of attributes such as opacity.

If you find numerical overflow in fp16, you can modify these lines as

scale_out = torch.clamp_max(self.scale_head(out), 100.) / 10000.
opacity_out = torch.sigmoid(self.opacity_head(out) / 100.)

to make the prediction numerically larger. Or you can implement the gs parameter regresser in fp32. However, the training process still works under fp32 or mixed precision.