Question about the valid seen/unseen performance report on the paper

Question

Question about the valid seen/unseen performance report on the paper

yestinl opened this issue 4 years ago · comments

I'm wonder that how you get the best performance on the valid seen split. You use the model having same parameters getting best performance on valid unseen then validate on the seen split or just report the best performance on valid seen but not same parameters on best valid unseen model

Hao Tan · Answer 1 · Sat Jun 20 2020 22:37:19 GMT+0800 (China Standard Time)

The number is reported on val_seen regarding the bet val_unseen snapshot.

Since val_unseen is the only actual validation set (we only care about the agent's performance in unseen environments), so we only care about the performance with val_unseen model here.
The val_seen is reported as an extra indicator of the best agent's performance. It's similar to the setup when multiple metrics are available but only one of them is the major metric. We would report all metrics based on the best main metric. For example, in Visual Dialogue (where NDCG is the main metric) and Image Captioning (where CIDEr is usually considered as the main metric).

BTW, in our new paper R2R-EnvBias, we study the way to eliminate the gap between val_seen and val_unseen. In that sense, both val_seen and val_unseen are considered as validation sets, we thus optimize the hyperparameters for each metric separately there.

yestin long · Answer 2 · Mon Jun 22 2020 17:58:11 GMT+0800 (China Standard Time)

Thanks a lot! It clarifies very clearly!