question on freezing target nework

Question

question on freezing target nework

hashbangCoder opened this issue 8 years ago · comments

Hi @yenchenlin1994 , love your implementation!
I went through your code and I can't seem to find where you've frozen the target network?
Unless Im missing something in my excess-caffeine induced brain fade,you continue to update the target every batch?
Wouldn't that hurt your convergence rate badly?

Yen-Chen Lin · Answer 1 · Wed Apr 20 2016 18:25:50 GMT+0800 (China Standard Time)

Hello,
Yeah you are right.
Actually I got a reimplemented version.
Will submit soon!
On Wed, Apr 20, 2016 at 17:46 Code-Deep-Blue notifications@github.com
wrote:

Hi @yenchenlin1994 https://github.com/yenchenlin1994 , love your
implementation!
I went through your code and I can't seem to find where you've frozen the
target network?
Unless Im missing something in my excess-caffeine induced brain fade,you
continue to update the target every batch?
Wouldn't that hurt your convergence rate badly?

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#15

athreya · Answer 2 · Tue May 10 2016 10:46:44 GMT+0800 (China Standard Time)

Hi again,
i'm trying to reproduce the results on keras and have trained for ~~400,000 steps and the bird is unable to cross the first pipe consistently. My loss is low though (~~ 0.2) and Q-values are in the range of [0,8]. How long did it take for you before it actually started working i.e. cross the first pipe consistently?

Yen-Chen Lin · Answer 3 · Tue May 10 2016 14:12:41 GMT+0800 (China Standard Time)

I can't remember the exactly number of iterations, but it's no more than ~1000,000 steps

zuoxin.xiahou · Answer 4 · Sat May 27 2017 15:29:27 GMT+0800 (China Standard Time)

Still cannot find freezing target network in current version's code. It's really no effect?

zsy372901 · Answer 5 · Wed Sep 20 2017 03:09:05 GMT+0800 (China Standard Time)

@hashbangCoder
I meet the same question that the silly bird keeps top of the screen.....Did you fix it?

weijinsong · Answer 6 · Fri Dec 08 2017 10:54:49 GMT+0800 (China Standard Time)

I also couldn't find freezing target network code. But thanks for your code. It's helpful for me.

Hongming Zhang · Answer 7 · Tue Jun 05 2018 11:56:05 GMT+0800 (China Standard Time)

I write a version base on this repo with freezing target network.FlappyBird_DQN_with_target_network

Patrick Liu · Answer 8 · Tue Jan 29 2019 11:01:15 GMT+0800 (China Standard Time)

Here is another repo with target network. https://github.com/patrick-12sigma/DRL_FlappyBird

I made target network an option. You can turn it on and off and experiment to see how much it affects the convergence of training.

I refactored the network into a class, and added some logging functionalities to track the training process. I also borrowed the human play function from @initial-h. Thanks!