Shouldn't manager_vf be function of x_t?

Question

Shouldn't manager_vf be function of x_t?

imbalu007 opened this issue 7 years ago · comments

Right after eq.(7) in the paper, the authors say V_t as a function of x_t. However, in the code it is a function of g_hat (feudal_policy.py->_build_manager()),
self.manager_vf = self._build_value(g_hat)
Shouldn't it be a function of x_t?

David Hershey · Answer 1 · Mon Sep 11 2017 18:51:07 GMT+0800 (China Standard Time)

That's correct, we caught that late in this development cycle. We still couldn't get anything to converge with that value function, but you're correct that it should be built from the visual input.

…

On Mon, Sep 11, 2017 at 2:17 AM, imbalu007 ***@***.***> wrote: Right after eq.(7) in the paper <https://arxiv.org/pdf/1703.01161.pdf>, the authors say V_t as a function of x_t. However, in the code it is a function of g_hat (feudal_policy.py->_build_manager()), self.manager_vf = self._build_value(g_hat) Shouldn't it be a function of x_t? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#2>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ALHLEtC3itFyh26hIcrDFt4nhyDlaemDks5shNBsgaJpZM4PSrxj> .

Lior Shani · Answer 2 · Wed Sep 13 2017 21:25:02 GMT+0800 (China Standard Time)

Hi,
I didn't understand your answer - Are you still trying to implement it or did you abandon this repository?

Thanks

David Hershey · Answer 3 · Wed Sep 13 2017 21:51:07 GMT+0800 (China Standard Time)

I'll make a formal post about it as I haven't looked at this in a bit (been busy elsewhere), but as of now I've talked with some researchers and we have reason to believe that FeUdal networks will not converge as described in the original paper. In the coming weeks I'll try to consolidate the code-base and clean it up as well as possible in case (1) new details are published on how to actually train these networks or (2) someone else can figure out a magic bullet.

…

On Wed, Sep 13, 2017 at 9:25 AM, Lior Shani ***@***.***> wrote: Hi, I didn't understand your answer - Are you still trying to implement it or did you abandon this repository? Thanks — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ALHLEk_RL2ZO9KS034nHLP3_R2q-llVwks5sh9evgaJpZM4PSrxj> .

Lior Shani · Answer 4 · Thu Sep 14 2017 02:28:16 GMT+0800 (China Standard Time)

Thanks for the quick and detailed response.

If I can offer any help, I haven't seen an implementation of the "dilated lstm" in your code (or maybe I missed it?).
I think it's a core idea in this paper for two reasons:

Had it worked without it, I don't believe it was in the paper.
Without it, there is no mechanism that controls the time interval c, so the goals objective is ill-defined. I believe it suppose to act as some kind of a finite-state machine.

David Hershey · Answer 5 · Thu Sep 14 2017 04:03:57 GMT+0800 (China Standard Time)

I do have a dilated LSTM implemented, its in a branch (see dlstm_fix). Hence the need for consolidating! You can find it in models/models.py

…

On Wed, Sep 13, 2017 at 2:28 PM, Lior Shani ***@***.***> wrote: Thanks for the quick and detalied response. If I can offer any help, I haven't seen an implementation of the "dilated lstm" in your code (or maybe I missed it?). I think it's a core idea in this paper for two reasons: 1. Had it worked without it, I don't believe it was in the paper. 2. Without it, there is no mechanism that controls the time interval c, so the goals objective is ill-defined. I believe it suppose to act as some kind of a finite-state machine. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ALHLEuxbIhqJcBQv2cTRXgc8gxB6QCuSks5siB7BgaJpZM4PSrxj> .

Lior Shani · Answer 6 · Thu Sep 14 2017 15:34:03 GMT+0800 (China Standard Time)

Ohh. I didn't check other branches. Anyhow, that's truly frustrating because this model sounds really tempting (and I guess you put a lot of work into it).
Good luck

Kade · Answer 7 · Mon Feb 26 2018 23:47:08 GMT+0800 (China Standard Time)

@dmakian You mentioned a formal post about convergence problems and the idea feudal networks may not converge as described in the paper. Any update there?

David Hershey · Answer 8 · Mon Feb 26 2018 23:56:09 GMT+0800 (China Standard Time)

Sorry this has been off my mind for a while. I doubt a post is coming.

In short: I believe that feudal networks can work, I just think that the implementation is fragile. If DeepMind released their code I'm sure it would function, but as with a lot of Deep RL small differences in code can lead to wildly different performance.