airsplay / R2R-EnvDrop

PyTorch Code of NAACL 2019 paper "Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Vanilla Listener (Follower) Training without RL (Student Forcing + Teacher Forcing?)

siddk opened this issue · comments

Hey Hao,

Been working my way through your repo, and when training the listener via the provided script (agent.bash), it seems that the v0 listener model is trained with a hybrid teacher-forcing + sampling approach with RL.

Specifically, you first seem do be doing a Teacher-Forcing update (which makes sense):

self.rollout(train_ml=args.ml_weight, train_rl=False, **kwargs)
. You weight this by the provided hyperparameter mlWeight = 0.2.

However, then you do a sampling update that computes the RL (A2C) loss as well:

self.rollout(train_ml=None, train_rl=True, **kwargs)
, which then triggers this:
if train_rl:
.

If I wanted to just train the "best" Listener model without RL, do you have recommendations. Setting "feedback = argmax" seems to trigger student forcing (which is what's used in the related work), but should I mix that with Teacher Forcing as well?

Any intuition you have is much appreciated. Computing the Teacher-Forcing loss and weighting it by the hyperparameter, and adding the StudentForcing loss is what I'm currently thinking. Otherwise, I might just do StudentForcing all the way through...

Thanks. I have not tried mixing the teaching forcing and student forcing together. What I found before is that teacher forcing works better than student forcing with this code base. Thus I finally use teacher forcing (SF) as the baseline and experiments with this train_ml=1.0 and train_rl=None. Looking forward to seeing whether TF + SF would win over TF!

Got it - thanks Hao, really appreciate it!