A question in the "match" module

Question

A question in the "match" module

sbbdms opened this issue 4 months ago · comments

Hi.

Recently I am testing some custom modification via KataGo's "match" module.
There are 2 bots in my test, which use the same NN model, with different param settings.
The settings above can be written in the config file with 2 forms below:

(1)
botName0 = a
botName1 = b
nnModelFile = nn.bin.gz

(2)
botName0 = a
botName1 = b
nnModelFile0 = nn0.bin.gz
nnModelFile1 = nn1.bin.gz
(nn.bin.gz, nn0.bin.gz, nn1.bin.gz are actually the same NN model file)

For quite a long time, I am using the first form in the config file. Later I accidently found that the second form runs ~20% quicker than the first form (With 3x4090, 288 games in parallel. 1 thread per game)

Is it possible to modify the code, to make KataGo automatically adjust its model usage to the second form, even if the config file is written in the first form?

Thanks!

David J Wu · Answer 1 · Wed Mar 20 2024 23:37:56 GMT+0800 (China Standard Time)

What are you setting nnMaxBatchSize to?

sbbdms · Answer 2 · Thu Mar 21 2024 00:12:26 GMT+0800 (China Standard Time)

It is the default value 32.

Sorry, since I always use the default nnMaxBatchSize value in gtp_example.cfg, I neglected to adjust this value in match.

I noticed that the recommended nnMaxBatchSize value in GTP is (numSearchThreads / numNNServerThreadsPerModel), so I guess the recommended value is 288/3=96 in my settings?

sbbdms · Answer 3 · Thu Mar 21 2024 08:58:23 GMT+0800 (China Standard Time)

I tried to adjust nnMaxBatchSize from 32 to 96 with the first form, however it seems that it is still ~10% slower than the second form with nnMaxBatchSize 32.

David J Wu · Answer 4 · Thu Mar 21 2024 10:36:19 GMT+0800 (China Standard Time)

That's interesting!

What happens if you use only 1x4090 instead of 3x4090?
What happens if you run a different number of games in parallel than 288?
Do you have any GPUs other than 4090 that you can test this on?
What happens if you run only 1 game at a time, but that game has a large number of search threads?

It wouldn't be appropriate to make this a general recommendation, and definitely not appropriate to modify the code to do it automatically, if we can't understand how general this is and what causes it. I think we probably wouldn't modify the code to do it automatically regardless, because loading the model multiple times on the GPU takes extra memory, which might not be suitable for some users. But it would still be interesting to learn more about this.