can the loop index variable use `size_t` type instead of ee_u32?

Question

can the loop index variable use `size_t` type instead of ee_u32?

lhtin opened this issue 2 years ago · comments

In the source code, like core_matrix.c, the index variable of a loop always uses a 32bits variable(ee_u32), but I think it should use the standard size_t, because it can reduce displacement on 64 bits machine which doesn't supply 32 bits registers:

void
matrix_mul_vect(ee_u32 N, MATRES *C, MATDAT *A, MATDAT *B)
{
    ee_u32 i, j;
    for (i = 0; i < N; i++)
    {
        C[i] = 0;
        for (j = 0; j < N; j++)
        {
            C[i] += (MATRES)A[i * N + j] * (MATRES)B[j];
        }
    }
}

Peter Torelli · Answer 1 · Tue Jun 21 2022 00:16:26 GMT+0800 (China Standard Time)

In our more recent benchamrks, we've started to move to uint_fast32_t for loops, rather than size_t, where the compiler can chose the fastest data type that meets at least 32-bits (whereas size_t forces the largest iterable range). The benefit is that uint_fast32_t can be larger than 32b depending on the architecture. However, CoreMark is so established that changing the core code would invalidate nearly a thousand published scores, so this version won't be changed.

Out of curiosity, what type of performance improvement do you see with size_t, and what platform are you running on?

Peter Torelli · Answer 2 · Tue Jun 21 2022 00:33:11 GMT+0800 (China Standard Time)

On a side note, you are most likely aware that ee_u32 is a user-defined type, and using the fast types might produce the results you expect rather than size_t.

Lehua Ding · Answer 3 · Tue Jun 21 2022 16:42:14 GMT+0800 (China Standard Time)

Out of curiosity, what type of performance improvement do you see with size_t, and what platform are you running on?

The platform is RISC-V 64bit Machine.

For example the matrix_mul_vect function:

void
matrix_mul_vect(ee_u32 N, MATRES *C, MATDAT *A, MATDAT *B)
{
    ee_u32 i, j;
    for (i = 0; i < N; i++)
    {
        C[i] = 0;
        for (j = 0; j < N; j++)
        {
            C[i] += (MATRES)A[i * N + j] * (MATRES)B[j];
        }
    }
}

unsigned int version (typedef unsigned int ee_u32;, online: https://godbolt.org/z/K5TqPdKE5):

matrix_mul_vect:
        beq     a0,zero,.L1
        slli    a5,a0,32 # meed clean upper bits
        srli    t5,a5,30
        add     t5,a1,t5
        mv      t3,a0
        li      t4,0
.L4:
        mv      a6,a3
        mv      a4,t4
        li      a7,0
.L3:
        slli    t1,a4,32 # need clean upper bits
        srli    a5,t1,31
        add     a5,a2,a5
        lh      t1,0(a6)
        lh      a5,0(a5)
        addiw   a4,a4,1
        addi    a6,a6,2
        mulw    a5,a5,t1
        addw    a7,a5,a7
        bne     t3,a4,.L3
        sw      a7,0(a1)
        addi    a1,a1,4
        addw    t4,a0,t4
        addw    t3,a0,t3
        bne     t5,a1,.L4
.L1:
        ret

size_t version (typedef size_t ee_u32;, online: https://godbolt.org/z/WThM8WrEq):

matrix_mul_vect:
        beq     a0,zero,.L1
        slli    t5,a0,2
        slli    t3,a0,1
        add     t5,a1,t5
        add     t3,a3,t3
        li      t4,0
.L4:
        slli    a6,t4,1
        add     a6,a2,a6
        mv      a4,a3
        li      a7,0
.L3:
        lh      a5,0(a6)
        lh      t1,0(a4)
        addi    a4,a4,2
        addi    a6,a6,2
        mulw    a5,a5,t1
        addw    a7,a5,a7
        bne     t3,a4,.L3
        sw      a7,0(a1)
        addi    a1,a1,4
        add     t4,t4,a0
        bne     t5,a1,.L4
.L1:
        ret

Lehua Ding · Answer 4 · Tue Jun 21 2022 16:48:25 GMT+0800 (China Standard Time)

On a side note, you are most likely aware that ee_u32 is a user-defined type, and using the fast types might produce the results you expect rather than size_t.

Because the ee_u32 is used in program data and many places, so I cannot change the ee_u32 to size_t. If it can extract an alone type for loop and array index, I think that is useful.

Peter Torelli · Answer 5 · Tue Jun 21 2022 23:06:12 GMT+0800 (China Standard Time)

Because the ee_u32 is used in program data and many places, so I cannot change the ee_u32 to size_t. If it can extract an alone type for loop and array index, I think that is useful.

No I meant to use uint_fast32_t, and not size_t. This should use a minimum of 32 bits for the math operations, and then whatever type is fastest for loops (64b).

Lehua Ding · Answer 6 · Wed Aug 17 2022 10:32:46 GMT+0800 (China Standard Time)

Thank you so much for the answer.