airsplay / R2R-EnvDrop

PyTorch Code of NAACL 2019 paper "Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about the valid seen/unseen performance report on the paper

yestinl opened this issue · comments

image
I'm wonder that how you get the best performance on the valid seen split. You use the model having same parameters getting best performance on valid unseen then validate on the seen split or just report the best performance on valid seen but not same parameters on best valid unseen model

The number is reported on val_seen regarding the bet val_unseen snapshot.

Since val_unseen is the only actual validation set (we only care about the agent's performance in unseen environments), so we only care about the performance with val_unseen model here.
The val_seen is reported as an extra indicator of the best agent's performance. It's similar to the setup when multiple metrics are available but only one of them is the major metric. We would report all metrics based on the best main metric. For example, in Visual Dialogue (where NDCG is the main metric) and Image Captioning (where CIDEr is usually considered as the main metric).

BTW, in our new paper R2R-EnvBias, we study the way to eliminate the gap between val_seen and val_unseen. In that sense, both val_seen and val_unseen are considered as validation sets, we thus optimize the hyperparameters for each metric separately there.

Thanks a lot! It clarifies very clearly!