yenchenlin / DeepLearningFlappyBird

Flappy Bird hack using Deep Reinforcement Learning (Deep Q-learning).

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

question on freezing target nework

hashbangCoder opened this issue · comments

Hi @yenchenlin1994 , love your implementation!
I went through your code and I can't seem to find where you've frozen the target network?
Unless Im missing something in my excess-caffeine induced brain fade,you continue to update the target every batch?
Wouldn't that hurt your convergence rate badly?

Hello,
Yeah you are right.
Actually I got a reimplemented version.
Will submit soon!
On Wed, Apr 20, 2016 at 17:46 Code-Deep-Blue notifications@github.com
wrote:

Hi @yenchenlin1994 https://github.com/yenchenlin1994 , love your
implementation!
I went through your code and I can't seem to find where you've frozen the
target network?
Unless Im missing something in my excess-caffeine induced brain fade,you
continue to update the target every batch?
Wouldn't that hurt your convergence rate badly?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#15

Hi again,
i'm trying to reproduce the results on keras and have trained for 400,000 steps and the bird is unable to cross the first pipe consistently. My loss is low though ( 0.2) and Q-values are in the range of [0,8]. How long did it take for you before it actually started working i.e. cross the first pipe consistently?

I can't remember the exactly number of iterations, but it's no more than ~1000,000 steps

Still cannot find freezing target network in current version's code. It's really no effect?

@hashbangCoder
I meet the same question that the silly bird keeps top of the screen.....Did you fix it?

I also couldn't find freezing target network code. But thanks for your code. It's helpful for me.

I write a version base on this repo with freezing target network.FlappyBird_DQN_with_target_network

Here is another repo with target network. https://github.com/patrick-12sigma/DRL_FlappyBird

I made target network an option. You can turn it on and off and experiment to see how much it affects the convergence of training.

I refactored the network into a class, and added some logging functionalities to track the training process. I also borrowed the human play function from @initial-h. Thanks!