Project Update

Question

Project Update

yhyu13 opened this issue 7 years ago · comments

Hi,

My name is Yohan YU, your developer of this repository. Right now, I'm able to run a volta instance on the cloud, let's see how it goes. I will keep updating in a week.

Samuel · Answer 1 · Wed Nov 08 2017 14:30:40 GMT+0800 (China Standard Time)

Perfect!! :)

俞航 · Answer 2 · Thu Nov 09 2017 00:41:26 GMT+0800 (China Standard Time)

HI,

The training result on 11/8/2017 is released. The training move prediction is expected, but it doesn't generalize well on validation dataset. Could someone review my code on batch normalization?

The way I use batch norm:

In Network.py

def train(...):
model.mode = 'train'
sess.run(...)
...

def test(...):
model.mode = 'test'
sess.run(...)
...

In alphagozero_resnet_model.py

def batch_norm(...):
    return tf.contrib.layer.batch_norm(...,training=model.mode=='train',)

def build_train_op():
    ....
    update = tf.train.GraphKeys.UPDATE_OP
    with control_dependence(update):
           train = opt.apply_grads(...)

俞航 · Answer 3 · Thu Nov 09 2017 00:45:29 GMT+0800 (China Standard Time)

Hi,

Since NVIDIA tensorflow container (required to use on aws p3) only support py2.7 at this moment, all training would be done under "py2.7" branch.

Samuel · Answer 4 · Thu Nov 09 2017 03:52:33 GMT+0800 (China Standard Time)

Training acc > 70%!! It would be a wonderful result if you are able to validate that the model can generalize correctly near this value. Why don't you use a validation dataset while you are training so you don't let the test to the final step after the train has finished? That'd tell you insubstantially whether the model is generalizating well or not.

俞航 · Answer 5 · Thu Nov 09 2017 05:48:29 GMT+0800 (China Standard Time)

@Zeta36 The model is evaluate at the end of every epoch of training. It does surprises me that the model doesn't generalize well. I'm sure there are logical bugs in my code.

俞航 · Answer 6 · Thu Nov 09 2017 06:01:19 GMT+0800 (China Standard Time)

Hi,

I found one bug so far. The feature extract method is different from what DeepMind describe.

Mine used to be each player's past 8 moves and their corresponding states. However, features should be each player's stones extracted from the board in the past 8 states.

Therefore, I would preprocess dataset again, train another version of supervised learning go agent.

I've figured out the coroutine implementation of MCTS. It's in debugging phase now.

Yours,
Yohan Yu

yuanfengpang · Answer 7 · Thu Nov 09 2017 15:33:12 GMT+0800 (China Standard Time)

Maybe you can try the new architecture of ResNet :)
reference:「Identity Mappings in Deep Residual Networks」arXiv:1603.05027 2016

俞航 · Answer 8 · Thu Nov 09 2017 19:18:49 GMT+0800 (China Standard Time)

@yuanfengpang

Thanks! That paper introduced a full pre-activation residual block that overcome overfitting on ResNet-164, ResNet-110, and even ResNet-1001 because (1) the skip connection is an identity function in comparison to the original one, which makes chain rule backprop simpler (2) BN first impose stronger regularization, which reduce overfitting.

俞航 · Answer 9 · Thu Nov 09 2017 20:10:00 GMT+0800 (China Standard Time)

Hi

It's your developer Yohan Yu. It seems like Nvidia docker (which P3 instance requires) is still only support python 2.7 legacy. This implies we can't utilize uvloop to program coroutine in python 2.7. Anyway, I am asking nvidia admin when they will sync their framework to python 3. Otherwise, we will just stay in supervised learning.

Regards,
Yohan Yu

俞航 · Answer 10 · Mon Nov 13 2017 09:32:38 GMT+0800 (China Standard Time)

Hi,

@Zeta36
@yuanfengpang

I manage to refractory the APV_MCTS_2.py by using @classmethod. I believe now it not only looks better but also run faster.

But as I said earlier, the main reason for speed up comes from going into more illegal move so that the expensive expansion is not invoked.

When comparing APV_MCTS.py and APV_MCTS_2.py, I found even though their hyperparameters are the same, the search result is quite different consistently, see this line in second version and this line in the first version. The second version consistently expand less leaf nodes than the first version. The best guess for me is that some coruntines bypass the virtual_loss_do() and manage to select the same location for both player, thus create illegal move in early game.

I also found adding dirichlet noise is expensive because the data structure of my MCTS node has no phyonic way to do it see this line.

The result is pretty good, with Cython conversion into C extension library:

4 core 1000expansion/1600search per move without NN evaluation

+Cython version:
 +2017-11-12 19:02:05,842 [59823] INFO     __main__: Global epoch 0 start.
 +2017-11-12 19:02:08,178 [59823] DEBUG    model.APV_MCTS_2_C: Searched for 2.33579 seconds
 +2017-11-12 19:02:08,179 [59823] DEBUG    __main__:
 +None
 +2017-11-12 19:02:08,179 [59823] INFO     __main__: Self-Play Simulation Game #0: 2.337 seconds

俞航 · Answer 11 · Thu Nov 16 2017 12:01:40 GMT+0800 (China Standard Time)

@Zeta36 @yuanfengpang

Hi,

I just made an update in a new branch called selfplay. Everything new is in that branch. Thanks to reversi-alpha-zero, I am making progress. The dynamic resign threshold is implemented. Self play pipeline is functional.

But there is only one piece missing: I can't create a tensorflow on the flay (like in Keras). TF either requires me to initialize two(or more) models in the same graph, or rename all variables (It should prevent me from loading a trained model effectively).

To deal with that, I remembered a previous project where i have to restore a checkpoint from VGG. Since the tensorflow .ckpt file stores dictionary like variable name and variable value. If variables share some part of their names, I can extract them and restore in another model.

None of them sounds like an good idea. What do you guys think? Should I switch to Keras because it stores model in HDF5 and .json

俞航 · Answer 12 · Thu Nov 16 2017 16:23:48 GMT+0800 (China Standard Time)

Hi

I've fixed the bug left by MuGo and pygtp because they requires implementation that know each's api.

For details, please take a look at 5a19859

Now, you can interact in GTP like:

python main.py --mode=gtp —-policy=random

2017-11-16 02:19:45,274 [20046] DEBUG    Network: Building Model Complete...Total parameters: 1581959
2017-11-16 02:19:45,606 [20046] DEBUG    Network: Loading Model...
2017-11-16 02:19:45,615 [20046] DEBUG    Network: Loading Model Failed
2017-11-16 02:19:46,702 [20046] DEBUG    Network: Done initializing variables
GTP engine ready
clear_board
=


showboard
   A B C D E F G H J K L M N O P Q R S T
19 . . . . . . . . . . . . . . . . . . . 19
18 . . . . . . . . . . . . . . . . . . . 18
17 . . . . . . . . . . . . . . . . . . . 17
16 . . . . . . . . . . . . . . . . . . . 16
15 . . . . . . . . . . . . . . . . . . . 15
14 . . . . . . . . . . . . . . . . . . . 14
13 . . . . . . . . . . . . . . . . . . . 13
12 . . . . . . . . . . . . . . . . . . . 12
11 . . . . . . . . . . . . . . . . . . . 11
10 . . . . . . . . . . . . . . . . . . . 10
 9 . . . . . . . . . . . . . . . . . . .  9
 8 . . . . . . . . . . . . . . . . . . .  8
 7 . . . . . . . . . . . . . . . . . . .  7
 6 . . . . . . . . . . . . . . . . . . .  6
 5 . . . . . . . . . . . . . . . . . . .  5
 4 . . . . . . . . . . . . . . . . . . .  4
 3 . . . . . . . . . . . . . . . . . . .  3
 2 . . . . . . . . . . . . . . . . . . .  2
 1 . . . . . . . . . . . . . . . . . . .  1
   A B C D E F G H J K L M N O P Q R S T
Move: 0. Captures X: 0 O: 0

None
=

play Black B1
= (1, (2, 1))

The limitation of pygtp is apparent. It lacks most the auxiliary functionality than GNU gtp, but it is workable. I integrated the MuGo game board visualization into pygtp.

俞航 · Answer 13 · Thu Nov 16 2017 22:54:20 GMT+0800 (China Standard Time)

Hi,

Checkout the latest update on the branch selfplay. I add the gtp support for a popular/modern Go GUI Sabaki. It looks pretty and runs smoothly. Follow the instructions in the README.md. Enjoy

yuanfengpang · Answer 14 · Fri Nov 17 2017 00:05:41 GMT+0800 (China Standard Time)

@yhyu13
Hi, thank you for the update.I think it is too late to switch to Keras. Maybe it should prevent you from loading a trained model effectively, but I think accelerate the play process is more important.

I am developing a go bot based on MuGo. As you know even the test acc > 50%, the MuGO is still very weak, and can not beat the traditional MCTS.

Recomputing the AlphaGo Zero weights will take about 1700 years on commodity hardware, see for example: http://computer-go.org/pipermail/computer-go/2017-October/010307.html.So I am wondering how far I can get with the commodity hardware (like GTX 1070)

May I ask why you started this repository :)

PS: I thought GoGUI was the best but Sabaki is greater!!!

俞航 · Answer 15 · Fri Nov 17 2017 16:02:08 GMT+0800 (China Standard Time)

@yuanfengpang

AWS P3 instance offers Volta GPU in commodity price. The highest configuration is 8 Volta + 64 CPU which costs only $24/h. So for 72 hours training, it only cost less than $2000 which is even cheaper than a laptop. When Amazon launched P3, I thought there is nothing can stop me. But this project is harder than I thought.

yuanfengpang · Answer 16 · Fri Nov 17 2017 20:31:38 GMT+0800 (China Standard Time)

@yhyu13
Hi, About Nov 15th Supervised Learning result. The MSE is between the actual outcome z ∈ {− 1, + 1} and the neural network value v, scaled by a factor of 1/4 to the range of 0–1. Figure 3 on the paper of alpha go ZERO.
It can not be over 1, did you calculate it wrong.

俞航 · Answer 17 · Sat Nov 18 2017 00:09:13 GMT+0800 (China Standard Time)

@yuanfengpang

Thanks! This explains my confusion. I didn't scale it down by a factor of 4. If I do, then the mse would be about 0.25, which still looks weird because the figure provided by deepmind shows a initial loss of 0.25 and a final loss of 0.22 something.

Sent from my Samsung SAMSUNG-SM-G935A using FastHub

俞航 · Answer 18 · Sat Nov 18 2017 00:15:39 GMT+0800 (China Standard Time)

Quick update: the batch norm variables are not loaded from the checkpoint. The model isn't perform correctly.

EDIT : Fixed. TF global variables initializer would reinitialize variables. So I need to call it first, and then load model checkpoint. I apologize for any inconvenience!

Sent from my Samsung SAMSUNG-SM-G935A using FastHub

俞航 · Answer 19 · Sat Nov 18 2017 12:36:13 GMT+0800 (China Standard Time)

Hi,

I figured out how to build another model in tensorflow. It is done by setting up another tf.Graph(). This update features fully functional self play pipeline. In this version, I only implement two network: one "candidate" and another "best model" because my computer doesn't have memory to build another 6 layer mini AlphaGo.

Let's me know what else you want to see in this project.

Yours,
Hang Yu

yuanfengpang · Answer 20 · Sat Nov 18 2017 22:50:16 GMT+0800 (China Standard Time)

http://www.igoshogi.net/ai_ryusei/01/en/
I am going to join a Go AI contest in Tokyo. If you are interested, I can register yours too.

By the way,in the end of policy head, I recommend you to use Global Average Pooling（GAP）instead of Full-Connect layer. Full-Connect layer cost so much resources and for now GAP preformed better than Full-Connect layer in image-net contest. .

俞航 · Answer 21 · Sun Nov 19 2017 04:04:28 GMT+0800 (China Standard Time)

@yuanfengpang

Wish your top success in Japan! I'm interested but I have to attend NIPS2017 in Los Angeles during that exact time.

I heard GAP is part of a fully convolutional neural net. I haven't digged deep into that part but I will take a close look on how it works better. Thanks!

俞航 · Answer 22 · Sun Nov 19 2017 15:53:33 GMT+0800 (China Standard Time)

@yuanfengpang

Take a look at a update of renet model GAP. Is it the idea you are talking about? And which paper has discussed the GAP? Thanks!

yuanfengpang · Answer 23 · Sun Nov 19 2017 16:01:12 GMT+0800 (China Standard Time)

Network In Network(https://arxiv.org/abs/1312.4400), this paper should help you :)

俞航 · Answer 24 · Mon Nov 20 2017 00:39:54 GMT+0800 (China Standard Time)

@yuanfengpang

Thank! From paper:

One advantage of global average pooling over the fully connected
layers is that it is more native to the convolution structure by enforcing correspondences between
feature maps and categories. Thus the feature maps can be easily interpreted as categories confidence
maps. Another advantage is that there is no parameter to optimize in the global average pooling
thus overfitting is avoided at this layer. Futhermore, global average pooling sums out the spatial
information, thus it is more robust to spatial translations of the input.

John Leung · Answer 25 · Sun Dec 10 2017 17:01:07 GMT+0800 (China Standard Time)

@yhyu13 The "selfplay" branch don't seem to exists.

俞航 · Answer 26 · Tue Dec 12 2017 07:37:39 GMT+0800 (China Standard Time)

Hi, @fuzzthink

It should be a closed issue by now, I've integrated the self play pipeline into the master branch already. See this merge Selfplay#6