Poor nodes/s performance

Question

Poor nodes/s performance

jongdetim opened this issue 10 months ago · comments

With a very simple evaluation function that only takes into account the piece values + bonus scores for piece positions, i'm getting poor performance. I've also implemented some move ordering and transposition tables which affect the speed (slower because of move ordering, faster because of t-tables).

I get around 150.000 nodes per second on a 2017 imac (3.4 GHz Intel Core i5). Without move sorting, it does about 210.000 nodes/s. It could be just a language limitation, as C# isn't the fastest language, but still shouldn't be that bad. It's not uncommon to easily get 2 million+ nodes visited per second in C++. Wondering what nodes/s others are getting!

WhiteMouse1 · Answer 1 · Fri Aug 04 2023 06:07:36 GMT+0800 (China Standard Time)

1200-1500 kN/s for mate search only.
Material+basic mobility+move ordering+mate drops it to ~400 kN/s
I have a 2009 AMD Phenom II X4 955

Tim de Jong · Answer 2 · Fri Aug 04 2023 06:23:18 GMT+0800 (China Standard Time)

Perhaps i packed my piece position value tables too tightly, but it does save a lot of tokens. I'll have to check the unpack function with a profiler. I currently have it packed like this:

    // every 32 bits is a row. every 64-bit int here is 2 rows
    static ulong[] piecePositionValueTable = {
        0x00000000050A0AEC, 0x05FBF60000000014, 0x05050A190A0A141E, 0x3232323200000000, // pawns
        0xCED8E2E2D8EC0005, 0xE2050A0FE2000F14, 0xE2050F14E2000A0F, 0xD8EC0000CED8E2E2, // knights
        0xECF6F6F6F6050000, 0xF60A0A0AF6000A0A, 0xF605050AF600050A, 0xF6000000ECF6F6F6, // bishops
        0x00000005FB000000, 0xFB000000FB000000, 0xFB000000FB000000, 0x050A0A0A00000000, // rooks
        0xECF6F6FBF6000000, 0xF605050500000505, 0xFB000505F6000505, 0xF6000000ECF6F6FB, // queens
        0x141E0A0014140000, 0xF6ECECECECE2E2D8, 0xE2D8D8CEE2D8D8CE, 0xE2D8D8CEE2D8D8CE  // kings
    };

    int GetPositionScore(int pieceType, int index) =>
        (sbyte)((piecePositionValueTable[pieceType * 4 + index / 16] >> (8 * (7 - (index % 8 < 4 ? index % 8 : 7 - index % 8) + index % 16 / 8 * 4))) & 0xFF);

I was also planning to replace calling this for the entire board during eval, instead maybe passing the value along & just calculating the difference between starting and target squares for the current move. If this is indeed a bottleneck, that should help.

Ryan Heath · Answer 3 · Fri Aug 04 2023 16:00:53 GMT+0800 (China Standard Time)

You might also try to run in release mode instead of the default debug mode, to see if it makes a difference.

Tim de Jong · Answer 4 · Fri Aug 04 2023 21:30:28 GMT+0800 (China Standard Time)

I'm running the project from the cli, not visual studio. Will dotnet run -c Release be different from a regular dotnet run command?

Ryan Heath · Answer 5 · Fri Aug 04 2023 21:42:00 GMT+0800 (China Standard Time)

Yes, it is defaulting to Debug

Sebastian Lague · Answer 6 · Sat Aug 05 2023 17:40:00 GMT+0800 (China Standard Time)

Running in release mode should make a pretty big difference (it’s about 4x for me). The core engine still has a lot of room for optimization though, so you’re never going to get the kind of nps you might expect from more serious engines.

Tim de Jong · Answer 7 · Sun Aug 06 2023 06:46:57 GMT+0800 (China Standard Time)

Thank you, running in release mode gives me a whopping ~3.5x speedup!