Shouldn't manager_vf be function of x_t?
imbalu007 opened this issue · comments
Right after eq.(7) in the paper, the authors say V_t as a function of x_t. However, in the code it is a function of g_hat (feudal_policy.py->_build_manager()),
self.manager_vf = self._build_value(g_hat)
Shouldn't it be a function of x_t?
Hi,
I didn't understand your answer - Are you still trying to implement it or did you abandon this repository?
Thanks
Thanks for the quick and detailed response.
If I can offer any help, I haven't seen an implementation of the "dilated lstm" in your code (or maybe I missed it?).
I think it's a core idea in this paper for two reasons:
- Had it worked without it, I don't believe it was in the paper.
- Without it, there is no mechanism that controls the time interval c, so the goals objective is ill-defined. I believe it suppose to act as some kind of a finite-state machine.
Ohh. I didn't check other branches. Anyhow, that's truly frustrating because this model sounds really tempting (and I guess you put a lot of work into it).
Good luck
@dmakian You mentioned a formal post about convergence problems and the idea feudal networks may not converge as described in the paper. Any update there?
Sorry this has been off my mind for a while. I doubt a post is coming.
In short: I believe that feudal networks can work, I just think that the implementation is fragile. If DeepMind released their code I'm sure it would function, but as with a lot of Deep RL small differences in code can lead to wildly different performance.