Question about OBSERVE state

Question

Question about OBSERVE state

mrgloom opened this issue 8 years ago · comments

mrgloom commented 8 years ago

What is purpose of number of OBSERVE steps > size of REPLAY_MEMORY?

Yen-Chen Lin · Answer 1 · Mon Mar 28 2016 01:05:26 GMT+0800 (China Standard Time)

Hello @mrgloom ,
I set OBSERVE steps so big just for demo purpose 😄

If you are trying to reproduce the model,
I've added a section about that.

Yen-Chen Lin · Answer 2 · Wed Mar 30 2016 18:16:19 GMT+0800 (China Standard Time)

Hi @mrgloom ,
If above comments have answered your question, would you please close this issue?
Thanks!

mrgloom · Answer 3 · Fri Apr 01 2016 16:49:41 GMT+0800 (China Standard Time)

I'm still not sure how number of OBSERVE timesteps estimated, it's just arbitary number BATCH < OBSERVE < REPLAY_MEMORY ?

Also what if I can't do all 3000000 at one time, how training can be continued? Just set OBSERVE to same value, load CNN weights, and set EXPLORE = (3000000 - steps_already_trained) ?

Yen-Chen Lin · Answer 4 · Sat Apr 02 2016 00:20:22 GMT+0800 (China Standard Time)

Hello @mrgloom

arbitary number BATCH < OBSERVE <= REPLAY_MEMORY

However, I set it according to the reference paper and empirical result.

Yes

mrgloom · Answer 5 · Wed Apr 06 2016 04:04:17 GMT+0800 (China Standard Time)

Also is there something special about OBSERVE state, for example should bird pass through a pipe at least once during this state or it's not necessary?
Or OBSERVE state just used to init replay memory?

Also I run 2 training cases (about 150000 timesteps) one with recommended parameters and another with no EXPLORE state at all (I set FINAL_EPSILON and INITIAL_EPSILON to 0)

I found that without EXPLORE state it also learn to play fine, but my intuition about this that it will choose more long routes trying to maximize score and this will lead to more risky playing, and with random actions at each timestep with small probability model learn to play more safely(so it's some kind of regularization?).

What my intuition can't understand is that how model learns to play game if during OBSERVE state bird do not pass any pipes.

Yen-Chen Lin · Answer 6 · Fri Sep 01 2017 11:40:49 GMT+0800 (China Standard Time)

OBSERVE is only used to fill in the replay memory.

Regarding why it still works without EXPLORE state, I think it's because this network is an overkill for this game.