maxpumperla / deep_learning_and_the_game_of_go

Code and other material for the book "Deep Learning and the Game of Go"

Home Page:https://www.manning.com/books/deep-learning-and-the-game-of-go

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ZeroAgent becomes weaker after learning

y-kkky opened this issue · comments

Steps I've done to start training ZeroAgent for 9x9 board size:

At first, I've put my model in a separate file:

Model def zero_model(board_size):
encoder = zero.ZeroEncoder(board_size)
board_input = Input(shape=encoder.shape(), name='board_input')
pb = board_input
for i in range(4):
    pb = Conv2D(64, (3, 3),
                padding='same',
                data_format='channels_first')(pb)
    pb = BatchNormalization(axis=1)(pb)
    pb = Activation('relu')(pb)

policy_conv = Conv2D(2, (1, 1), data_format='channels_first')(pb)
policy_batch = BatchNormalization(axis=1)(policy_conv)
policy_relu = Activation('relu')(policy_batch)
policy_flat = Flatten()(policy_relu)
policy_output = Dense(encoder.num_moves(), activation='softmax')(
    policy_flat)

value_conv = Conv2D(1, (1, 1), data_format='channels_first')(pb)
value_batch = BatchNormalization(axis=1)(value_conv)
value_relu = Activation('relu')(value_batch)
value_flat = Flatten()(value_relu)
value_hidden = Dense(256, activation='relu')(value_flat)
value_output = Dense(1, activation='tanh')(value_hidden)

model = Model(
    inputs=[board_input],
    outputs=[policy_output, value_output])

return model

Then, I initialized my ZeroAgent this way:

encoder = zero.ZeroEncoder(9)
model = zero_model(9)
agent = zero.ZeroAgent(model, encoder, rounds_per_move=50, c=2.0)
with h5py.File('original_zero.h5', 'w') as outf:
    agent.serialize(outf)

I created play_train_eval_zero.py by the example of other play_train_eval.py s:
https://pastebin.com/HUHnYWBX

Example configuration:
--agent original_zero.h5 --num-workers 6 --games-per-batch 500 --board-size 9 --games-eval 60

And what I see is the degradation of my bot during learning:

Reference: original_zero.h5 Learning_agent: original_zero.h5 Total games so far 0
Won 33 / 60 games (0.550)
New reference is agent_00000500.hdf5
Reference: agent_00000500.hdf5 Learning_agent: agent_cur.hdf5 Total games so far 500
Won 25 / 60 games (0.417)
Reference: agent_00000500.hdf5 Learning_agent: agent_cur.hdf5 Total games so far 1000
Won 10 / 60 games (0.167)
Reference: agent_00000500.hdf5 Learning_agent: agent_cur.hdf5 Total games so far 1500
Won 11 / 60 games (0.183)

I tried different configurations and it is a common pattern for all of them and I have no idea.

I had some assumptions:

  1. That somehow I copy the wrong file after training and my reference becomes stronger while learning_agent remains the same. It is not true - I added an additional step to save files after each round if learning_agent lost evaluation, and checked hashes of model files - they were the same for reference_agent (as it should be), so it is not the case.
  2. I have also checked manually that encoder works fine, also that experience collector works fine, looks like they are ok.
    UPD. I found out that I'm using komi 7.5 for 9x9 board, but maybe this is ok

The last assumption is that I do very small amount of simulations, but I'm afraid to run this on paid GPU server, to check this assumption - cause it is very strange pattern that bot becomes weaker with each round.

What parameter should I tune for real training?
As I understand there are 4 parameters to play with:

  1. model
  2. encoder
  3. games per batch
  4. rounds per move

How to decide which should I sacrifice?

Do you train AlphaGo that is described in chapter 13? If completed,can you share the model files?

@computer-idol AlphagoZero from chapter 14