Regarding depth map in training
Yifehuang97 opened this issue · comments
Thanks for your interesting work. I have a question regarding the depth. If, during training, I utilize depth maps obtained directly from MVS rather than generating synthetic depth maps, could this significantly impact the model's performance?
Hi, thanks for your attention!
I have two concerns about MVS. (1) Since most MVS methods are also trained with depth supervision, I think it is a better choice to train the MVS model on your targeting data rather than using a generalizable pretrained model. (2) The MVS model is relatively slow compared with a binocular depth estimation method (e.g. Raft-Stereo).
However, if you find the depth map from MVS enables the unprotected point cloud to be consistent within the same 3D space, I think the MVS method can be used as a depth estimator without significantly degrading the performance.
Thanks so much for your reply!
My current problem is that I want to generate some ground truth depth on avatar data(So I don't think I can train any depth model on my data).
What I have are multi-view images, camera parameters, and FLAME with texture map. Currently I am thinking about using MVS(COLMAP dense prediction) to estimate the depth for each view. Is it also fine to render some training data from the FLAME mesh with texture map(which is not a scan)?
Thanks for your time!
I think you can try to generate some synthetic data from FaceVerse-Dataset to train a robust MVS model. When synthesizing training data, the camera setup should be parallel to your collected real-world data. I think the performance will be better than that trained on FLAME mesh since detailed geometry cues are not included in FLAME template.
Thank you some much! I will try it!