Mapping network input

Question

Mapping network input

markkim1115 opened this issue a year ago · comments

Hello, thanks for the good work.

You used eg3d based triplane feature grid. From the public eg3d code repo, I found there is 2 inputs for mapping network. one is z, another is conditional variable c ( in the eg3d, condition is 25-dimensional vector that represents camera parameters).

I wonder that in your model, you feed just global averaged 1D feature vector to the mapping network alone, or used other design(e.g. z is random vector and 1D feature vector is conditional variable)?

How did you applied LPIPS loss on the rendering outputs? Do you render image patches during training?

Shoukang Hu · Answer 1 · Tue Jul 04 2023 12:43:21 GMT+0800 (China Standard Time)

Hi, thanks for your interest in our work.

we use a pre-trained Resnet18 Backbone to extract 512-dimensional vector as z and do not use c in our case.
we directly render the whole image by utilizing human prior and then apply LPIPS on it. Based on previous experience on NeRF training, if you render image patches, similar performance should be achieved.

markkim1115 · Answer 2 · Tue Jul 04 2023 14:42:53 GMT+0800 (China Standard Time)

Oh, i see.
But rendering whole image may require large GPU memory.
What GPU did you use to train the model?

markkim1115 · Answer 3 · Tue Jul 04 2023 15:54:38 GMT+0800 (China Standard Time)

I found the issue about the GPU. Thanks. Closing the issue.