lightvector / KataGo

GTP engine and self-play learning in Go

Home Page:https://katagotraining.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

KataGo v1.4.2 vs LZ272

lightvector opened this issue · comments

Just posting for the record some test results against LZ272 that I ran a while back using KataGo 1.4.2 and the last "semi-zero" nets (not the last nets in the run as a whole), which were g170-b40c256x2-s3708042240-d967973220 (40 blocks) and g170e-b20c256x2-s4384473088-d968438914 (20 blocks). And also against LZ-ELFv2, just to see how far we've come since ELF.

I posted these results about a month ago in the discord chat, this is just re-posting them here.

Summary: KataGo won around 80-90% of games given comparable amounts of compute time (but, on a V100 machine which might have a smaller gap between GPU performance between KG and LZ than would be the case on certain users' hardware) and won 70%-80% of games when put at a modest visits handicap to LZ, without having to enable avoidMYTDaggerHack, although enabling it significantly further helped in some cases.


All tests used a single V100 cloud GPU (roughly, comparable to AWS "P3 2xlarge" instance, except on Google Cloud, not AWS).

KataGo was left at mostly default settings, but with a bit of tuning:

  • 64 threads (suggested by the normal benchmark tool for the 40b, did not attempt to tune 20b separately).
  • NN cache bumped to 2^23
  • As a reminder, default settings also include 0.5 early temperature, decaying to 0.1 with halflife 19.

LZ272 and LZ-ELF used:

  • --threads 32 --batchsize 16 since some testing indicated that this produced best LZ performance given the GPU.
  • --randomcnt 20 --randomtemp 0.3 to increase opening diversity on LZ's side a little in lieu of having an opening panel. Higher than LZ's default of no temperature at all, but still lower and briefer overall than KataGo's default.
  • --noponder --timemanage off

Also, both sides set to resign immediately at 5% winrate.

First test, KG set to use a fixed 5 seconds per move, and LZ used 18K playouts per move LZ-ELFv2 used 36K playouts per move, aiming to make them take about 5 seconds per move because they have no command-line way to fix a time per move. In actuality, they took about 5.6 s/move and 6s/move, so this calibration was a bit off, in LZ's favor.

Win/loss results:

                              LZ272(40b)  LZ-ELFv2(20b)
KG40b avoid dagger hack:     151/162(93%)  79/81 (97%)
KG40b plain:                 135/164(82%)  78/82 (95%)
KG20b avoid dagger hack:     143/160(89%)  76/80 (95%)
KG20b plain:                 150/164(91%)  79/82 (96%)

Second test: fixed playouts, KG set a bit lower than either LZ or ELF.

  • LZ used 10k playouts/move
  • LZ-ELFv2 used 20k playouts/move
  • KG 40b and 20b BOTH used 5k playouts/move. (so 20b moves quite fast).
                              LZ272(40b)  LZ-ELFv2(20b)
KG40b avoid dagger hack:     148/172(86%)  78/86 (90%)
KG40b plain:                 137/168(81%)  73/84 (86%)
KG20b avoid dagger hack:     126/170(74%)  76/86 (88%)
KG20b plain:                 118/168(70%)  70/84 (83%)

Games here:
kg142-vslz272elf5s-split.zip
kg142-vslz272elfpfixed-split.zip

Also, I'm momentarily about to upload some nets that, at least when tested against older KataGo nets, appear to be much stronger than these nets, due to learning rate drops at the end of the g170 run. :)

@lightvector san,

Great results!

I have a question.
How much stronger is KataGo 40b than KataGo 20b on the same condition of a fixed 5 seconds per move?

For recent nets, 40b and 20b were close from each other on time parity, but 20b was almost not progressing anymore. For the coming nets, I guess 40b will be significantly stronger as it improved a lot and I guess 20b did not improve a lot. But we'll know more in a few hours hopefully, patience 😜

@Friday9i san,

I can't wait for the new release!😆

Done! https://github.com/lightvector/KataGo/releases

Gained 200ish Elo for the 40 block net, and 100ish Elo for the 20 block net, based on matches against earlier networks. No idea how much this gain transfers to gains against opponents like LZ, but anyone is free of course to try them and compare. Enjoy!

Congrats!!
Any plan or new features for the next official run?