eembc / coremark

CoreMark® is an industry-standard benchmark that measures the performance of central processing units (CPU) and embedded microcrontrollers (MCU).

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

can the loop index variable use `size_t` type instead of ee_u32?

lhtin opened this issue · comments

In the source code, like core_matrix.c, the index variable of a loop always uses a 32bits variable(ee_u32), but I think it should use the standard size_t, because it can reduce displacement on 64 bits machine which doesn't supply 32 bits registers:

void
matrix_mul_vect(ee_u32 N, MATRES *C, MATDAT *A, MATDAT *B)
{
    ee_u32 i, j;
    for (i = 0; i < N; i++)
    {
        C[i] = 0;
        for (j = 0; j < N; j++)
        {
            C[i] += (MATRES)A[i * N + j] * (MATRES)B[j];
        }
    }
}

In our more recent benchamrks, we've started to move to uint_fast32_t for loops, rather than size_t, where the compiler can chose the fastest data type that meets at least 32-bits (whereas size_t forces the largest iterable range). The benefit is that uint_fast32_t can be larger than 32b depending on the architecture. However, CoreMark is so established that changing the core code would invalidate nearly a thousand published scores, so this version won't be changed.

Out of curiosity, what type of performance improvement do you see with size_t, and what platform are you running on?

On a side note, you are most likely aware that ee_u32 is a user-defined type, and using the fast types might produce the results you expect rather than size_t.

Out of curiosity, what type of performance improvement do you see with size_t, and what platform are you running on?

The platform is RISC-V 64bit Machine.

For example the matrix_mul_vect function:

void
matrix_mul_vect(ee_u32 N, MATRES *C, MATDAT *A, MATDAT *B)
{
    ee_u32 i, j;
    for (i = 0; i < N; i++)
    {
        C[i] = 0;
        for (j = 0; j < N; j++)
        {
            C[i] += (MATRES)A[i * N + j] * (MATRES)B[j];
        }
    }
}

unsigned int version (typedef unsigned int ee_u32;, online: https://godbolt.org/z/K5TqPdKE5):

matrix_mul_vect:
        beq     a0,zero,.L1
        slli    a5,a0,32 # meed clean upper bits
        srli    t5,a5,30
        add     t5,a1,t5
        mv      t3,a0
        li      t4,0
.L4:
        mv      a6,a3
        mv      a4,t4
        li      a7,0
.L3:
        slli    t1,a4,32 # need clean upper bits
        srli    a5,t1,31
        add     a5,a2,a5
        lh      t1,0(a6)
        lh      a5,0(a5)
        addiw   a4,a4,1
        addi    a6,a6,2
        mulw    a5,a5,t1
        addw    a7,a5,a7
        bne     t3,a4,.L3
        sw      a7,0(a1)
        addi    a1,a1,4
        addw    t4,a0,t4
        addw    t3,a0,t3
        bne     t5,a1,.L4
.L1:
        ret

size_t version (typedef size_t ee_u32;, online: https://godbolt.org/z/WThM8WrEq):

matrix_mul_vect:
        beq     a0,zero,.L1
        slli    t5,a0,2
        slli    t3,a0,1
        add     t5,a1,t5
        add     t3,a3,t3
        li      t4,0
.L4:
        slli    a6,t4,1
        add     a6,a2,a6
        mv      a4,a3
        li      a7,0
.L3:
        lh      a5,0(a6)
        lh      t1,0(a4)
        addi    a4,a4,2
        addi    a6,a6,2
        mulw    a5,a5,t1
        addw    a7,a5,a7
        bne     t3,a4,.L3
        sw      a7,0(a1)
        addi    a1,a1,4
        add     t4,t4,a0
        bne     t5,a1,.L4
.L1:
        ret

On a side note, you are most likely aware that ee_u32 is a user-defined type, and using the fast types might produce the results you expect rather than size_t.

Because the ee_u32 is used in program data and many places, so I cannot change the ee_u32 to size_t. If it can extract an alone type for loop and array index, I think that is useful.

Because the ee_u32 is used in program data and many places, so I cannot change the ee_u32 to size_t. If it can extract an alone type for loop and array index, I think that is useful.

No I meant to use uint_fast32_t, and not size_t. This should use a minimum of 32 bits for the math operations, and then whatever type is fastest for loops (64b).

Thank you so much for the answer.