skhu101 / SHERF

Code for our ICCV'2023 paper "SHERF: Generalizable Human NeRF from a Single Image"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mapping network input

markkim1115 opened this issue · comments

Hello, thanks for the good work.

  1. You used eg3d based triplane feature grid. From the public eg3d code repo, I found there is 2 inputs for mapping network. one is z, another is conditional variable c ( in the eg3d, condition is 25-dimensional vector that represents camera parameters).

I wonder that in your model, you feed just global averaged 1D feature vector to the mapping network alone, or used other design(e.g. z is random vector and 1D feature vector is conditional variable)?

  1. How did you applied LPIPS loss on the rendering outputs? Do you render image patches during training?

Hi, thanks for your interest in our work.

  1. we use a pre-trained Resnet18 Backbone to extract 512-dimensional vector as z and do not use c in our case.
  2. we directly render the whole image by utilizing human prior and then apply LPIPS on it. Based on previous experience on NeRF training, if you render image patches, similar performance should be achieved.

Oh, i see.
But rendering whole image may require large GPU memory.
What GPU did you use to train the model?

I found the issue about the GPU. Thanks. Closing the issue.