haochen-rye / HNeRV

Official Pytorch implementation for HNeRV: a hybrid video neural representation (CVPR 2023)

Home Page:https://haochen-rye.github.io/HNeRV/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The difference between HNeRV and AutoEncoder

dawnlh opened this issue · comments

commented

Hi~ @haochen-rye
Thanks for sharing your nice work. After reading the paper, I find that the network structure and design scheme of HNeRV seems to be similar to an auto-encoder (AE). Although original AEs are mainly used for supervised/unsupervised learning, appling it to data fitting/compression is also a direct & valid idea. For classical NeRF (or NeRV from your another work), one could use a coordinate to query the corresponding pixel value or frame values. But for HNeRV, the input is actually the video/frame itself rather than the coordinate, which means one cannot query designed data from an explicit coordinate and instead he must has the image embedding from the encoder beforehand to query the image.

I think this should be the main difference between HNeRV & conventional NeRF, NeRV, and ENeRV. So have I misunderstood something? And what's your opinion about the difference.

BTW, I wonder how long it takes to train the NeRV & HNerV. I didn't find the absolute time in the paper. Thanks.

commented

HNeRV vs NeRV (or NeRF etc.):

  • hybrid vs implicit representation, input content-adaptive embed vs content-agnostic embed (coordinate)

HNeRV VS auto-encoder:

  • embed size: tiny vs huge, HNeRV uses a tiny frame embed which only store little video content while AE store all information in the huge embed
  • Generalization: HNeRV is only fit on one video, while AE is generally fit on a large dataset and should be able to reconstruct all these videos. Therefore AE is much bigger and slower than AE in most cases
commented

Got it. Then what about the training time? The paper gives the relative time consumption result, but I wonder how long it takes to finish the data fitting.

commented

We conduct all experiments in Pytorch with RTX2080ti GPUs, where it takes around 8s per epoch to train a 130 frame video of size 640 × 1280.

commented

OK, thanks for your prompt reply.