aipixel / GPS-Gaussian

[CVPR 2024 Highlight] The official repo for “GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis”

Home Page:https://shunyuanzheng.github.io/GPS-Gaussian

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problems about preparing the data

ZhenhuiL1n opened this issue · comments

Hi, Wonderful work and I have a few questions?

  1. I have tried to reproduce the results using the setup I have, but I found in your rendering code, fx and fy, cx and cy is set according to the image resolution instead of the intrinsic parameter we calibrated, I want to know what is the intuition behind it?

  2. And then I tried to adjust my cameras to your 16 views setting, however I only have 5 cameras and the lens is not wide enough. So I set it on a side with 22.5 degree angle of rotation between each, the picth is almost 0(like 1-2 difference) and take 1080*1920 resolution images.

I want to ask in this case, Do I need to set the base pitch to like 1-2 degree, because I found that in my testing, there are some flickering in the fingers as shown below:

lin423_m.mp4

image
image
image

Hi, Wonderful work and I have a few questions?

  1. I have tried to reproduce the results using the setup I have, but I found in your rendering code, fx and fy, cx and cy is set according to the image resolution instead of the intrinsic parameter we calibrated, I want to know what is the intuition behind it?
  2. And then I tried to adjust my cameras to your 16 views setting, however I only have 5 cameras and the lens is not wide enough. So I set it on a side with 22.5 degree angle of rotation between each, the picth is almost 0(like 1-2 difference) and take 1080*1920 resolution images.

I want to ask in this case, Do I need to set the base pitch to like 1-2 degree, because I found that in my testing, there are some flickering in the fingers as shown below:

lin423_m.mp4

Hi, thanks for your interest.

For question 1, the camera setup depends on your real-world hardware. Since the intrinsic parameters of both our collected data and DNA-Rendering data are roughly according to the image resolution, I do this just for simplicity. However, I recommend setting the virtual camera parameters identical to your hardware. The radius also needs to be modified besides the intrinsic parameters to make sure the rendered synthetic image looks similar to your captured data.

For question 2, the base pitch is only used to determine the 'look at' center. I think the flickering results are mainly caused by the bias of the scene setup between the synthetic data and the test data, especially the radius and the baseline degree.

Hi,

Thanks for the reply, it is beneficial.

And I found that in the render_data you provide, the radius is set to 2.15, and in the rea_data you provide, the calibrated ratio is around 1.95. My question is that defining the virtual data ratio like this. is it on purpose and for what purpose? and How to actually match the virtual rendered data to the real-world data we measured and then finally work with the calibrated parameters?

Thanks!!!!

I set the radius around 2m, with humans randomly moving in a small region at the center. But where comes 2.15? And I am also confused about the calibrated ratio, does it refer to fx and fy? I only render the synthetic data identical to my captured data, e.g. I have measured the radius of the scene equal to 2 meters. I think you can roughly confirm the radius and the baseline degrees, and set the focal similar to your captured data. Then I think the rendered image will look like your captured image.

Sorry, by calibrated ratio I mean radius.

I checked the extrinsic parameter in the render_data I downloaded, the E[2,3] is around 2.15 as shown in the first image and the extrinsic parameter in real_data I downloaded, the E[2,3] is around 1.95 as shown in the second image.

image
image

A little difference will not affect the result since the human has a random movement in the virtual scene. However, training on a 2-meter scene and testing on a 3-meter one will degrade the performance.

Ok, thanks for the quick reply!