lantonov / asmFish

A continuation of the nice project asmFish by Mohammed Li. Latest version: 07.08.2019

Home Page:https://lantonov.github.io/asmFish/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Stability tests for asmfish9 needed.

tthsqe12 opened this issue · comments

commented

How stable is this released version 9?
We have very large books and syzygy that should be tested thoroughly before releasing.
Maybe play a couple of matches with books and syzygy and make sure the engine doesn't crash.

Agreed, this would be prudent. Large books (up to 1 GB+) appear to be working perfectly. I haven't noticed any issues with syzygy TBs, but perhaps @lantonov can devise some experiments to test them.

*Will leave this as unpublished until we can verify the stability of the syzygy TBs.

@tthsqe12

One thing that struck me as odd the other day is that asmFish appears to crash when the "quit" command is given. I am unsure as to when this started exactly, but this doesn't appear to be an issue with armFish.

commented

Will look into crashing at quit.

Match 100 games asmFishW_2018-04-08_bmi2 vs stockfish-ad5d86c 100 games depth=14
25 26
31 32
35 36
43 44
57 58
71 72
81 82
97 98
The number of differing pairs is 8
results.txt
I guess that this difference is because of skipping Ronald's patch official-stockfish/Stockfish@fd4d800 as you mentioned in the pull request. If needed, I can make a bisection of the recent patches to see which of them generates the difference.
The games are with syzygy 5-men, the setup line for cutechess-cli is
cutechess-cli -tournament gauntlet -rounds 100 -concurrency 2 -repeat -engine name=asmFishW_2018-04-08_bmi2 cmd=asmFishW_2018-04-08_bmi2 -engine name=stockfish-ad5d86c cmd=stockfish-ad5d86c -each proto=uci tc=inf depth=14 option.Hash=16 option.SyzygyPath="C:\Winboard\syzygy" option.SyzygyProbeLimit=5 option.SyzygyProbeDepth=10 -tb C:\Winboard\syzygy -openings file=2moves_v1.pgn format=pgn order=random -draw movenumber=34 movecount=8 score=20 -resign movecount=3 score=400 -pgnout results.pgn -sprt elo0=-1 elo1=1 alpha=0.05 beta=0.05
I haven't encountered a crash but if I do, I will give the offset and the uci dialog.
Next I will test on time control and, maybe, with larger book (Cerebellum).

Score of asmFishW_2018-04-08_bmi2 vs stockfish-ad5d86c: 19 - 18 - 63 [0.505] 100 games depth=14
Elo difference: 3.47 +/- 41.56
SPRT: llr 0.0156, lbound -2.94, ubound 2.94
23 24
25 26
35 36
47 48
53 54
91 92
The number of differing pairs is 6
results.txt
The book is a huge pgn file with variations, comments and NAGs containing the entire "Encyclopaedia of chess openings".

Score of asmFishW_2018-04-08_bmi2 vs stockfish-ad5d86c: 22 - 20 - 58 [0.510] 100 games TC 10+0.1 sec
Elo difference: 6.95 +/- 44.30
SPRT: llr 0.0274, lbound -2.94, ubound 2.94
Book "Encyclopaedia of chess openings"
results.txt

commented

With syzygy enabled there is no hope of matching official even with Ronald's patch because official doesn't do alpha-beta, whereas cfish and asmfish do. So if looking for exact matching, syzygy should be off.
@CounterPly Was the crashing at quit on windows? Mac OS?

Just FYI , 5 minutes with 10 second game increment , custom book designed to exaggerate the rating difference ( fewer draw openings).
306 of 378 games completed...
Time control: 300 + 10.00 seconds
Threads: 1
Hash: 1024
Date: 04/09/18 : 13:38:52
Rank Name Rating Δ + - # Σ Σ% W L D W% =% OppR

1 asmFishX_2018-04-09_popc 3108 0.0 23 23 204 105.5 51.7 29 22 153 14.2 75.0 3096
2 McBrain 9.1 64 POPCNT 3101 6.8 23 23 205 103.0 50.2 27 26 152 13.2 74.1 3100
3 Stockfish 080418 64 POPC 3091 9.9 23 23 203 97.5 48.0 21 29 153 10.3 75.4 3104

Asmfish is doing just fine ...

Sorry for the ugly font - no control when posting from iPhone.

All previous patches that I have tested for matching were with syzygy on, and they matched exactly. But ok, I will test for matching without syzygy.

@MichaelB7, No problem with the font, thanks for testing.

Score of asmFishW_2018-04-08_bmi2 vs stockfish-ad5d86c: 25 - 17 - 58 [0.540] 100 games TC 10+0.1 sec
Elo difference: 27.85 +/- 44.26
SPRT: llr 0.111, lbound -2.94, ubound 2.94
results.txt
Book Cerebellum_Light_Poly.bin

@lantonov
Here are additional large .bin books for your tests. I have also uploaded this month's beta version, which is currently over 400MB in size.

@tthsqe12
The crash at quit occurs on Windows for me. I can also check MacOS in a few hours when I get home.

Score of asmFishW_2018-04-08_bmi2 vs stockfish-ad5d86c: 39 - 3 - 58 [0.680] 100 games TC 10+0.1
Elo difference: 130.94 +/- 42.39
SPRT: llr 0.84, lbound -2.94, ubound 2.94
Medulla_Beta_April.bin book (400 MB) only for asmFish
results.txt

Score of asmFishW_2018-04-08_bmi2 vs stockfish-ad5d86c: 28 - 27 - 45 [0.505] 100 games depth=14
No syzygy
Elo difference: 3.47 +/- 50.79
SPRT: llr 0.0105, lbound -2.94, ubound 2.94
37 38
93 94
The number of differing pairs is 2
results.txt
This was with all TB options removed, cutechess-cli settings:
cutechess-cli -tournament gauntlet -rounds 100 -concurrency 2 -repeat -engine name=asmFishW_2018-04-08_bmi2 cmd=asmFishW_2018-04-08_bmi2 -engine name=stockfish-ad5d86c cmd=stockfish-ad5d86c -each proto=uci tc=inf depth=14 option.Hash=16 -openings file=2moves_v1.pgn format=pgn order=random -draw movenumber=34 movecount=8 score=20 -resign movecount=3 score=400 -pgnout results.pgn -sprt elo0=-1 elo1=1 alpha=0.05 beta=0.05

After the Contempt 20 patch, asmFish matched exactly SF even with syzygy on -- #152.
If I match official-stockfish/Stockfish@0a5b03a with 9ed34be and they differ, then possibly Ronald's patch which is between them can be responsible. If not, then I have to look in another patch for the difference.

@lantonov

When you get a chance, could you please take another one of your excellent speed-measurements of asmFish 9 vs. Stockfish 9 (for the release page)? Preferable settings would be HT-OFF (via BIOS), syzygy disabled (per Moha's recommendation), along with whatever else you may deem appropriate.

The release branch is located here: https://github.com/lantonov/asmFish/tree/asmFish9

Probably best to compare speed to official release here: https://stockfishchess.org/download/

@CounterPly
I don't have access to my 64bit at the moment (at work) and I'll make the speed measurements immediately after getting home.
The last speed measurement in #152 showed about 16.6% speedup though it was not comparing exactly what is required. I guess that the new measurement would be similar.

commented

@lantonov I can confirm the difference in games 37 and 38. This might take some time, but if you could bisect to find the exact point where things diverge that would be helpful. In the mean time, I am trying to find difference my own way.

@CounterPly I could not reproduce such a crash. If you could tell me exactly (1) was version you used (2) what assemble options (3) what commands you fed the engine, that would be helpful.

@tthsqe12
I will bisect this evening as planned

commented

@lantonov I found it. It was technically my mistake because I misapplied the stupid promotion rules for c/c++. But, I think the promotion rules are stupid anyways, and they probably were not considered when the tt size patch was committed. Therefore, I opened a PR official-stockfish/Stockfish#1540.

@tthsqe12

If you could tell me exactly (1) was version you used (2) what assemble options (3) what commands you fed the engine, that would be helpful.

(1) The Windows (popcnt) executable for both the current master and asmFish9 branch.

(2) All defaults from the source are used and assembly is initiated with:
set include=x86\include\
"fasmg.exe" "x86\fish.asm" "asmFishW_9_popcnt.exe" -e 1000 -i "VERSION_OS='W'" -i "PEDANTIC = 1" -i "VERSION_POST = 'popcnt'"

(3) The "quit" command (alone or after any other commands) appears to result in a crash. Here is a short .mp4 of what I see on my end.

Build Tester: 1.4.7.0
Windows 10 (Version 10.0, Build 0, 64-bit Edition)
Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz
SafeMode: No
Running In VM: No
HyperThreading Enabled: No
CPU Warmup: Yes
Command Line: bench 256 4 18 default depth
Tests per Build: 50
ANOVA: n/a
Engine 1: asmFishW_9_bmi2.exe
Engine 2: stockfish_9_x64_bmi2.exe
Engine# (NPS) Speedup Sp Conf. 99.5% S.S.
1 (5,729,040.1 ) ---> 2 (5,096,354.4 ) ---> 12.414% 82,914.3 Yes No
asmFish: best 5,985,742; worst 5,357,424; std dev 95,376.3; avg time / run 15.92
stockfish: best 5,248,761; worst 4,928,603; std dev 68, 212.5; avg time / run 19.81
Only 12.4% speedup, seems that the SF on the site is very well optimized.

@tthsqe12
I will still do the bisection, just in case.

Score of asmFishW_9_bmi2 vs stockfish_9_x64_bmi2: 28 - 27 - 45 [0.505] 100 games depth=14 5-men syzygy
Elo difference: 3.47 +/- 50.79
SPRT: llr 0.0105, lbound -2.94, ubound 2.94
51 52
53 54
61 62
65 66
69 70
71 72
79 80
The number of differing pairs is 7
readme.txt

@lantonov

Thanks for the speed test. A 12.4% speedup is still quite significant.

One more thing -- could you confirm if the windows executables indeed crash on quit? I am just trying to confirm that this crash I am observing isn't unique to my system for whatever reason. I wouldn't want to waste Moha's valuable time by having him chase a red herring.

@CounterPly
In my system with the command 'quit' from the asmFish window the application closes without crashing (nothing in the Event Viewer).

Score of asmFishW_7fd57bd_bmi2 vs stockfish-254d995: 25 - 25 - 50 [0.500] 100 games depth=14, 5-man syzygy
Elo difference: 0.00 +/- 48.39
SPRT: llr 2.83e-15, lbound -2.94, ubound 2.94
The number of differing pairs is 0
results.txt
This is the Contempt 20 patch. Just for confirmation.

hopefully the guy @

http://www.sp-cc.de/index.htm

will update to a newer asmfish version soon.

Score of asmFishW_9ed34be_bmi2 vs stockfish-0a5b03a: 25 - 24 - 51 [0.505] 100 games depth=14, 5-man syszygy
Elo difference: 3.47 +/- 47.90
SPRT: llr 0.0117, lbound -2.94, ubound 2.94
9 10
23 24
33 34
43 44
63 64
77 78
85 86
93 94
The number of differing pairs is 8
results.txt
So the problem is at the level of patch Limit the king distance factor when evaluating passed pawns. bench 5059457. Ronald's patch is between Contempt 20 and Limit the king ... so it is not excluded from suspicion.

commented

I am almost certain that the tt size patch is the culprit. Try merging the PR and running your tests with new latest asmfish and stockfish-ad5d86c. Of course syzygy MUST BE OFF to match official on fix depth testing. Its not just a matter of Ronald's patch. asmfish and cfish use alpha-beta pruning in the tb search while official does not.

Score of asmFishW_9ed34be_bmi2 vs stockfish-0a5b03a: 29 - 29 - 42 [0.500] 100 games depth=14, no syzygy
Elo difference: 0.00 +/- 52.18
SPRT: llr -3e-15, lbound -2.94, ubound 2.94
The number of differing pairs is 0
results.txt
This suggests that part of the problem is connected with syzygy. There may be another problem in later patches not connected with syzygy.

commented

@CounterPly crashes can be fixed, but nothing crashes on my machine. Your video unfortunately left out the most important numbers. May I have:
(1) the commit number you used to assemble
(2) the command used to assemble
(3) the module name, exception code and exception offset as in https://stackoverflow.com/questions/7143895/how-do-i-trace-an-intermittent-crash-that-occurs-only-under-the-debugger-but-is

Score of asmFishW_2018-04-10_bmi2 vs stockfish-ad5d86c: 19 - 19 - 62 [0.500] 100 games, depth=14, no syzygy
Elo difference: 0.00 +/- 42.12
SPRT: llr -9.99e-16, lbound -2.94, ubound 2.94
The number of differing pairs is 0
results.txt

Score of asmFishW_2018-04-10_bmi2 vs stockfish-ad5d86c: 17 - 16 - 67 [0.505] 100 games depth=14, 5-men syzygy
Elo difference: 3.47 +/- 39.23
SPRT: llr 0.0175, lbound -2.94, ubound 2.94
1 2
5 6
13 14
23 24
29 30
37 38
71 72
99 100
The number of differing pairs is 8
results.txt

@tthsqe12
(1) commit 38d4221
(2) "fasmg.exe" "x86\fish.asm" "asmFishW_9_bmi2.exe" -e 1000 -i "VERSION_OS='W'" -i "PEDANTIC = 1" -i "VERSION_POST = 'bmi2'"
(3) The information my video failed to capture was:
Exception thrown at 0x00007FFC6414304C (ntdll.dll) in asmFishW_9_bmi2.exe: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF. occurred

After about an hour of stepping through this in x64dbg, it suddenly stopped crashing for me and I am unable to reproduce this error. I know this sounds rediculous, but I assembled the exact same source in identical fashions. One time it crashed, and the next time it didn't. I have no explanation. Let's just forget about it for now until it comes back and I can figure out how to reproduce this crash reliably.

It appears armFish has recently lost functionality in certain mobile apps (i.e. DroidFish -- it would be nice if someone else could confirm this).

@tthsqe12
Were all of your modifications in Structs.arm (lines 62-76) in 758412b intentional? I suspect this commit may be what is causing these issues (despite armFish still being benchable via qemu).

commented

@CounterPly you are exactly correct. The Evaluate fxn subtracts sizeof.EvalInfo from sp before going to work. Problems arise when this size is not a multiple of 16. It is strange that this passed qemu though. So it seems that android phones will flag accesses from non-16byte aligned sp but qemu will not, at least not by default. I think this is controlled by a bit in a register in the cpu - one would have to check the docs. Anyways, could you pad the EvalInfo struct to a multiple of 16 bytes and add an assert like there is for the State struct? If you dont get to it, I can do it tmr.

@tthsqe12

Thanks. Fixed @ 6bd6580

@tthsqe12 @lantonov

I haven't been able to find any additional bugs. Stability wise, is everything else in good order for the asmFish 9 release?

I have no problem with stability on my system.
A small unease is the different evaluation with syzygy on but I don't think it is an obstacle to asmFish 9 release.

So based on what Moha mentioned earlier, the differences associated with alpha-beta pruning during TB-probing would imply that asmFish 9+syzygy would have identical functionality to Cfish 9+syzygy but not SF 9+syzygy, right?

Yes, so I understood. However, up to Contempt 20 patch inclusive, asmFish matched SF exactly even with syzygy on.