yenchenlin / DeepLearningFlappyBird

Flappy Bird hack using Deep Reinforcement Learning (Deep Q-learning).

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about OBSERVE state

mrgloom opened this issue · comments

What is purpose of number of OBSERVE steps > size of REPLAY_MEMORY?

Hello @mrgloom ,
I set OBSERVE steps so big just for demo purpose 😄

If you are trying to reproduce the model,
I've added a section about that.

Hi @mrgloom ,
If above comments have answered your question, would you please close this issue?
Thanks!

I'm still not sure how number of OBSERVE timesteps estimated, it's just arbitary number BATCH < OBSERVE < REPLAY_MEMORY ?

Also what if I can't do all 3000000 at one time, how training can be continued? Just set OBSERVE to same value, load CNN weights, and set EXPLORE = (3000000 - steps_already_trained) ?

Hello @mrgloom

  1. arbitary number BATCH < OBSERVE <= REPLAY_MEMORY

However, I set it according to the reference paper and empirical result.

  1. Yes

Also is there something special about OBSERVE state, for example should bird pass through a pipe at least once during this state or it's not necessary?
Or OBSERVE state just used to init replay memory?

Also I run 2 training cases (about 150000 timesteps) one with recommended parameters and another with no EXPLORE state at all (I set FINAL_EPSILON and INITIAL_EPSILON to 0)

I found that without EXPLORE state it also learn to play fine, but my intuition about this that it will choose more long routes trying to maximize score and this will lead to more risky playing, and with random actions at each timestep with small probability model learn to play more safely(so it's some kind of regularization?).

What my intuition can't understand is that how model learns to play game if during OBSERVE state bird do not pass any pipes.

OBSERVE is only used to fill in the replay memory.

Regarding why it still works without EXPLORE state, I think it's because this network is an overkill for this game.