can the loop index variable use `size_t` type instead of ee_u32?
lhtin opened this issue · comments
In the source code, like core_matrix.c, the index variable of a loop always uses a 32bits variable(ee_u32
), but I think it should use the standard size_t
, because it can reduce displacement on 64 bits machine which doesn't supply 32 bits registers:
void
matrix_mul_vect(ee_u32 N, MATRES *C, MATDAT *A, MATDAT *B)
{
ee_u32 i, j;
for (i = 0; i < N; i++)
{
C[i] = 0;
for (j = 0; j < N; j++)
{
C[i] += (MATRES)A[i * N + j] * (MATRES)B[j];
}
}
}
In our more recent benchamrks, we've started to move to uint_fast32_t
for loops, rather than size_t
, where the compiler can chose the fastest data type that meets at least 32-bits (whereas size_t
forces the largest iterable range). The benefit is that uint_fast32_t
can be larger than 32b depending on the architecture. However, CoreMark is so established that changing the core code would invalidate nearly a thousand published scores, so this version won't be changed.
Out of curiosity, what type of performance improvement do you see with size_t
, and what platform are you running on?
On a side note, you are most likely aware that ee_u32
is a user-defined type, and using the fast
types might produce the results you expect rather than size_t
.
Out of curiosity, what type of performance improvement do you see with
size_t
, and what platform are you running on?
The platform is RISC-V 64bit Machine.
For example the matrix_mul_vect
function:
void
matrix_mul_vect(ee_u32 N, MATRES *C, MATDAT *A, MATDAT *B)
{
ee_u32 i, j;
for (i = 0; i < N; i++)
{
C[i] = 0;
for (j = 0; j < N; j++)
{
C[i] += (MATRES)A[i * N + j] * (MATRES)B[j];
}
}
}
unsigned int
version (typedef unsigned int ee_u32;
, online: https://godbolt.org/z/K5TqPdKE5):
matrix_mul_vect:
beq a0,zero,.L1
slli a5,a0,32 # meed clean upper bits
srli t5,a5,30
add t5,a1,t5
mv t3,a0
li t4,0
.L4:
mv a6,a3
mv a4,t4
li a7,0
.L3:
slli t1,a4,32 # need clean upper bits
srli a5,t1,31
add a5,a2,a5
lh t1,0(a6)
lh a5,0(a5)
addiw a4,a4,1
addi a6,a6,2
mulw a5,a5,t1
addw a7,a5,a7
bne t3,a4,.L3
sw a7,0(a1)
addi a1,a1,4
addw t4,a0,t4
addw t3,a0,t3
bne t5,a1,.L4
.L1:
ret
size_t
version (typedef size_t ee_u32;
, online: https://godbolt.org/z/WThM8WrEq):
matrix_mul_vect:
beq a0,zero,.L1
slli t5,a0,2
slli t3,a0,1
add t5,a1,t5
add t3,a3,t3
li t4,0
.L4:
slli a6,t4,1
add a6,a2,a6
mv a4,a3
li a7,0
.L3:
lh a5,0(a6)
lh t1,0(a4)
addi a4,a4,2
addi a6,a6,2
mulw a5,a5,t1
addw a7,a5,a7
bne t3,a4,.L3
sw a7,0(a1)
addi a1,a1,4
add t4,t4,a0
bne t5,a1,.L4
.L1:
ret
On a side note, you are most likely aware that
ee_u32
is a user-defined type, and using thefast
types might produce the results you expect rather thansize_t
.
Because the ee_u32 is used in program data and many places, so I cannot change the ee_u32 to size_t
. If it can extract an alone type for loop and array index, I think that is useful.
Because the ee_u32 is used in program data and many places, so I cannot change the ee_u32 to size_t. If it can extract an alone type for loop and array index, I think that is useful.
No I meant to use uint_fast32_t
, and not size_t
. This should use a minimum of 32 bits for the math operations, and then whatever type is fastest for loops (64b).
Thank you so much for the answer.