Ground-truth leakage of validation dataset

Question

Ground-truth leakage of validation dataset

Fangkang515 opened this issue a year ago · comments

Thank you for your work.

During the testing of the code, I have encountered an issue where the validation dataset is also included in the training data. Specifically, the train_paths variable actually contains val_paths in here. I assumed that the subsequent code would filter it based on the 'is_val' flag, but I have discovered that it is not the case ( 16 line and 29 line indicate that the ground truth of the validation set is not fully masked out).

To verify this, I replaced all the images in the 'datasets/building-pixsfm/train/rgbs' directory with completely black images (pixel values=0), while keeping the images in the 'datasets/building-pixsfm/val/rgbs' directory unchanged. However, I found that the results on the validation set after training were not black images but rather the images like 'datasets/building-pixsfm/val/rgbs'.

Therefore, I suspect that the ground truth of the validation set has leaked, which may lead to inflated performance metrics on the validation set. Additionally, I am unsure if the metrics mentioned in the paper, such as PSNR, are based on the results from the validation set.

Of course, please feel free to point out any mistakes if I have misunderstood the situation.

Thank you and best regards,

hturki · Answer 1 · Thu Jun 29 2023 22:51:45 GMT+0800 (China Standard Time)

Since we're using per-image appearance embeddings, we're doing something similar to NeRF in the Wild where we train on half of each validation image, and then evaluate on the other held-out half of the image (and report metrics based on those)