tensorflow / minigo

An open-source implementation of the AlphaGoZero algorithm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Help] runtime error on Google Colaboratory

y-ich opened this issue · comments

Hi.

I tried to run Minigo on Google Colaboratory.

I managed to compile it, ran bazel-bin/cc/gtp, and got the following error.

  • command
    !bazel-bin/cc/gtp --device=$TPU_NAME --model=saved_models/000820-defence.minigo --seconds_per_move=60 --value_init_penalty=2.0

  • error
    resign_threshold:-0.999 resign_enabled:1 komi:7.5 value_init_penalty:2 policy_softmax_temp:0.98 soft_pick_enabled:0 soft_pick_cutoff:30 inject_noise:0 virtual_losses:8 num_readouts:100 seconds_per_move:60 time_limit:0 decay_factor:0.98 fastplay_frequency:0 fastplay_readouts:20 target_pruning:0 random_seed:0
    Will cache up to 704290 inferences, using roughly 1024MB.

    Initializing TPU grpc://10.110.208.194:8470
    Warming up...
    2020-02-02 14:14:27.900112: F cc/dual_net/tpu_dual_net.cc:167] Non-OK-status: session_->RunCallable(handle_, inputs_, &outputs_, nullptr) status: Invalid argument: From /job:tpu_worker/replica:0/task:0:
    Compilation failure: Matrix size-incompatible: In[0]: [1,722], In[1]: [256,128]
    [[{{node dense/MatMul}}]]
    TPU compilation failed
    [[tpu_compile_succeeded_assert/_17721758171533651890/_4]]
    *** SIGABRT received at time=1580652867 ***
    PC: @ 0x7fc747cc3e97 (unknown) (unknown)
    @ 0x5628351fecc2 64 absl::AbslFailureSignalHandler()
    @ 0x7fc748629890 631976416 (unknown)
    @ 0x56283524e57c 752 minigo::TpuDualNet::RunMany()
    @ 0x56283521253f 5504 minigo::GtpClient::Run()
    @ 0x5628351fe5ef 672 minigo::(anonymous namespace)::Gtp()
    @ 0x5628351fcc68 32 main
    @ 0x7fc747ca6b97 (unknown) (unknown)
    @ 0x82d6258d4c544155 (unknown) (unknown)

Does this mean that the weight file is wrong?

I generated it by the following command with modifed freeze_graph.py.
!python freeze_graph.py --model_path=gs://minigo-pub/v17-19x19/models/000820-defence --save_path=saved_models/000820-defence --use_tpu=true --tpu_name=$TPU_NAME --num_tpu_cores=8

Since the original freeze_graph.py saves the output graph to the same path as the input, I modified it to enable to specify the output path.

I will appreciate any advices.
Thank you!