yhyu13 / AlphaGOZero-python-tensorflow

Congratulation to DeepMind! This is a reengineering implementation (on behalf of many other git repo in /support/) of DeepMind's Oct19th publication: [Mastering the Game of Go without Human Knowledge]. The supervised learning approach is more practical for individuals. (This repository has single purpose of education only)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Project Update

yhyu13 opened this issue · comments

Hi,

My name is Yohan YU, your developer of this repository. Right now, I'm able to run a volta instance on the cloud, let's see how it goes. I will keep updating in a week.

Perfect!! :)

HI,

The training result on 11/8/2017 is released. The training move prediction is expected, but it doesn't generalize well on validation dataset. Could someone review my code on batch normalization?

The way I use batch norm:

In Network.py

def train(...):
model.mode = 'train'
sess.run(...)
...

def test(...):
model.mode = 'test'
sess.run(...)
...

In alphagozero_resnet_model.py

def batch_norm(...):
    return tf.contrib.layer.batch_norm(...,training=model.mode=='train',)

def build_train_op():
    ....
    update = tf.train.GraphKeys.UPDATE_OP
    with control_dependence(update):
           train = opt.apply_grads(...)

Hi,

Since NVIDIA tensorflow container (required to use on aws p3) only support py2.7 at this moment, all training would be done under "py2.7" branch.

Training acc > 70%!! It would be a wonderful result if you are able to validate that the model can generalize correctly near this value. Why don't you use a validation dataset while you are training so you don't let the test to the final step after the train has finished? That'd tell you insubstantially whether the model is generalizating well or not.

@Zeta36 The model is evaluate at the end of every epoch of training. It does surprises me that the model doesn't generalize well. I'm sure there are logical bugs in my code.

Hi,

I found one bug so far. The feature extract method is different from what DeepMind describe.

Mine used to be each player's past 8 moves and their corresponding states. However, features should be each player's stones extracted from the board in the past 8 states.

Therefore, I would preprocess dataset again, train another version of supervised learning go agent.

I've figured out the coroutine implementation of MCTS. It's in debugging phase now.

Yours,
Yohan Yu

1
2

Maybe you can try the new architecture of ResNet :)
reference:「Identity Mappings in Deep Residual Networks」arXiv:1603.05027 2016

@yuanfengpang

Thanks! That paper introduced a full pre-activation residual block that overcome overfitting on ResNet-164, ResNet-110, and even ResNet-1001 because (1) the skip connection is an identity function in comparison to the original one, which makes chain rule backprop simpler (2) BN first impose stronger regularization, which reduce overfitting.

Hi

It's your developer Yohan Yu. It seems like Nvidia docker (which P3 instance requires) is still only support python 2.7 legacy. This implies we can't utilize uvloop to program coroutine in python 2.7. Anyway, I am asking nvidia admin when they will sync their framework to python 3. Otherwise, we will just stay in supervised learning.

Regards,
Yohan Yu

Hi,

@Zeta36
@yuanfengpang

I manage to refractory the APV_MCTS_2.py by using @classmethod. I believe now it not only looks better but also run faster.

But as I said earlier, the main reason for speed up comes from going into more illegal move so that the expensive expansion is not invoked.

When comparing APV_MCTS.py and APV_MCTS_2.py, I found even though their hyperparameters are the same, the search result is quite different consistently, see this line in second version and this line in the first version. The second version consistently expand less leaf nodes than the first version. The best guess for me is that some coruntines bypass the virtual_loss_do() and manage to select the same location for both player, thus create illegal move in early game.

I also found adding dirichlet noise is expensive because the data structure of my MCTS node has no phyonic way to do it see this line.

The result is pretty good, with Cython conversion into C extension library:

4 core 1000expansion/1600search per move without NN evaluation

+Cython version:
 +2017-11-12 19:02:05,842 [59823] INFO     __main__: Global epoch 0 start.
 +2017-11-12 19:02:08,178 [59823] DEBUG    model.APV_MCTS_2_C: Searched for 2.33579 seconds
 +2017-11-12 19:02:08,179 [59823] DEBUG    __main__:
 +None
 +2017-11-12 19:02:08,179 [59823] INFO     __main__: Self-Play Simulation Game #0: 2.337 seconds

@Zeta36 @yuanfengpang

Hi,

I just made an update in a new branch called selfplay. Everything new is in that branch. Thanks to reversi-alpha-zero, I am making progress. The dynamic resign threshold is implemented. Self play pipeline is functional.

But there is only one piece missing: I can't create a tensorflow on the flay (like in Keras). TF either requires me to initialize two(or more) models in the same graph, or rename all variables (It should prevent me from loading a trained model effectively).

To deal with that, I remembered a previous project where i have to restore a checkpoint from VGG. Since the tensorflow .ckpt file stores dictionary like variable name and variable value. If variables share some part of their names, I can extract them and restore in another model.

None of them sounds like an good idea. What do you guys think? Should I switch to Keras because it stores model in HDF5 and .json

Hi

I've fixed the bug left by MuGo and pygtp because they requires implementation that know each's api.

For details, please take a look at 5a19859

Now, you can interact in GTP like:

python main.py --mode=gtp —-policy=random

2017-11-16 02:19:45,274 [20046] DEBUG    Network: Building Model Complete...Total parameters: 1581959
2017-11-16 02:19:45,606 [20046] DEBUG    Network: Loading Model...
2017-11-16 02:19:45,615 [20046] DEBUG    Network: Loading Model Failed
2017-11-16 02:19:46,702 [20046] DEBUG    Network: Done initializing variables
GTP engine ready
clear_board
=


showboard
   A B C D E F G H J K L M N O P Q R S T
19 . . . . . . . . . . . . . . . . . . . 19
18 . . . . . . . . . . . . . . . . . . . 18
17 . . . . . . . . . . . . . . . . . . . 17
16 . . . . . . . . . . . . . . . . . . . 16
15 . . . . . . . . . . . . . . . . . . . 15
14 . . . . . . . . . . . . . . . . . . . 14
13 . . . . . . . . . . . . . . . . . . . 13
12 . . . . . . . . . . . . . . . . . . . 12
11 . . . . . . . . . . . . . . . . . . . 11
10 . . . . . . . . . . . . . . . . . . . 10
 9 . . . . . . . . . . . . . . . . . . .  9
 8 . . . . . . . . . . . . . . . . . . .  8
 7 . . . . . . . . . . . . . . . . . . .  7
 6 . . . . . . . . . . . . . . . . . . .  6
 5 . . . . . . . . . . . . . . . . . . .  5
 4 . . . . . . . . . . . . . . . . . . .  4
 3 . . . . . . . . . . . . . . . . . . .  3
 2 . . . . . . . . . . . . . . . . . . .  2
 1 . . . . . . . . . . . . . . . . . . .  1
   A B C D E F G H J K L M N O P Q R S T
Move: 0. Captures X: 0 O: 0

None
=

play Black B1
= (1, (2, 1))

The limitation of pygtp is apparent. It lacks most the auxiliary functionality than GNU gtp, but it is workable. I integrated the MuGo game board visualization into pygtp.

Hi,

Checkout the latest update on the branch selfplay. I add the gtp support for a popular/modern Go GUI Sabaki. It looks pretty and runs smoothly. Follow the instructions in the README.md. Enjoy

@yhyu13
Hi, thank you for the update.I think it is too late to switch to Keras. Maybe it should prevent you from loading a trained model effectively, but I think accelerate the play process is more important.

I am developing a go bot based on MuGo. As you know even the test acc > 50%, the MuGO is still very weak, and can not beat the traditional MCTS.

Recomputing the AlphaGo Zero weights will take about 1700 years on commodity hardware, see for example: http://computer-go.org/pipermail/computer-go/2017-October/010307.html.So I am wondering how far I can get with the commodity hardware (like GTX 1070)

May I ask why you started this repository :)

PS: I thought GoGUI was the best but Sabaki is greater!!!

@yuanfengpang

AWS P3 instance offers Volta GPU in commodity price. The highest configuration is 8 Volta + 64 CPU which costs only $24/h. So for 72 hours training, it only cost less than $2000 which is even cheaper than a laptop. When Amazon launched P3, I thought there is nothing can stop me. But this project is harder than I thought.

@yhyu13
Hi, About Nov 15th Supervised Learning result. The MSE is between the actual outcome z ∈ {− 1, + 1} and the neural network value v, scaled by a factor of 1/4 to the range of 0–1. Figure 3 on the paper of alpha go ZERO.
It can not be over 1, did you calculate it wrong.

@yuanfengpang

Thanks! This explains my confusion. I didn't scale it down by a factor of 4. If I do, then the mse would be about 0.25, which still looks weird because the figure provided by deepmind shows a initial loss of 0.25 and a final loss of 0.22 something.

Sent from my Samsung SAMSUNG-SM-G935A using FastHub

Quick update: the batch norm variables are not loaded from the checkpoint. The model isn't perform correctly.

EDIT : Fixed. TF global variables initializer would reinitialize variables. So I need to call it first, and then load model checkpoint. I apologize for any inconvenience!

Sent from my Samsung SAMSUNG-SM-G935A using FastHub

Hi,

I figured out how to build another model in tensorflow. It is done by setting up another tf.Graph(). This update features fully functional self play pipeline. In this version, I only implement two network: one "candidate" and another "best model" because my computer doesn't have memory to build another 6 layer mini AlphaGo.

Let's me know what else you want to see in this project.

Yours,
Hang Yu

http://www.igoshogi.net/ai_ryusei/01/en/
I am going to join a Go AI contest in Tokyo. If you are interested, I can register yours too.

By the way,in the end of policy head, I recommend you to use Global Average Pooling(GAP)instead of Full-Connect layer. Full-Connect layer cost so much resources and for now GAP preformed better than Full-Connect layer in image-net contest. .

@yuanfengpang

Wish your top success in Japan! I'm interested but I have to attend NIPS2017 in Los Angeles during that exact time.

I heard GAP is part of a fully convolutional neural net. I haven't digged deep into that part but I will take a close look on how it works better. Thanks!

@yuanfengpang

Take a look at a update of renet model GAP. Is it the idea you are talking about? And which paper has discussed the GAP? Thanks!

Network In Network(https://arxiv.org/abs/1312.4400), this paper should help you :)

@yuanfengpang

Thank! From paper:

One advantage of global average pooling over the fully connected
layers is that it is more native to the convolution structure by enforcing correspondences between
feature maps and categories. Thus the feature maps can be easily interpreted as categories confidence
maps. Another advantage is that there is no parameter to optimize in the global average pooling
thus overfitting is avoided at this layer. Futhermore, global average pooling sums out the spatial
information, thus it is more robust to spatial translations of the input.

@yhyu13 The "selfplay" branch don't seem to exists.

Hi, @fuzzthink

It should be a closed issue by now, I've integrated the self play pipeline into the master branch already. See this merge Selfplay#6