davidhershey / feudal_networks

An implementation of FeUdal Networks for Hierarchical Reinforcement Learning as published : https://arxiv.org/abs/1703.01161

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Shouldn't manager_vf be function of x_t?

imbalu007 opened this issue · comments

commented

Right after eq.(7) in the paper, the authors say V_t as a function of x_t. However, in the code it is a function of g_hat (feudal_policy.py->_build_manager()),
self.manager_vf = self._build_value(g_hat)
Shouldn't it be a function of x_t?

Hi,
I didn't understand your answer - Are you still trying to implement it or did you abandon this repository?

Thanks

Thanks for the quick and detailed response.

If I can offer any help, I haven't seen an implementation of the "dilated lstm" in your code (or maybe I missed it?).
I think it's a core idea in this paper for two reasons:

  1. Had it worked without it, I don't believe it was in the paper.
  2. Without it, there is no mechanism that controls the time interval c, so the goals objective is ill-defined. I believe it suppose to act as some kind of a finite-state machine.

Ohh. I didn't check other branches. Anyhow, that's truly frustrating because this model sounds really tempting (and I guess you put a lot of work into it).
Good luck

commented

@dmakian You mentioned a formal post about convergence problems and the idea feudal networks may not converge as described in the paper. Any update there?

Sorry this has been off my mind for a while. I doubt a post is coming.

In short: I believe that feudal networks can work, I just think that the implementation is fragile. If DeepMind released their code I'm sure it would function, but as with a lot of Deep RL small differences in code can lead to wildly different performance.