Consolidate `_mm_prefetch`
jserv opened this issue · comments
Jim Huang commented
Current _mm_prefetch
does not behave as Intel documentation states:
Fetch the line of data from memory that contains address
p
to a location in the cache heirarchy specified by the locality hinti
.
We shall consolidate:
- Refine the function prototype. i.e.,
void _mm_prefetch(char const *p, int i)
- Provide the corresponding test cases. See test/x86/sse.c (Function
test_simde_mm_prefetch
) - Properly manipulate the locality hint.
The implementation from SIMDe:
void simde_mm_prefetch (const void* p, int i) {
switch(i) {
case SIMDE_MM_HINT_NTA:
__builtin_prefetch(p, 0, 0);
break;
case SIMDE_MM_HINT_T0:
__builtin_prefetch(p, 0, 3);
break;
case SIMDE_MM_HINT_T1:
__builtin_prefetch(p, 0, 2);
break;
case SIMDE_MM_HINT_T2:
__builtin_prefetch(p, 0, 1);
break;
case SIMDE_MM_HINT_ENTA:
__builtin_prefetch(p, 1, 0);
break;
case SIMDE_MM_HINT_ET0:
__builtin_prefetch(p, 1, 3);
break;
case SIMDE_MM_HINT_ET1:
__builtin_prefetch(p, 1, 2);
break;
case SIMDE_MM_HINT_ET2:
__builtin_prefetch(p, 0, 1);
break;
}
}
Reference: SIMDe Issue #897.
Jim Huang commented
Evan Nemerson, the original author of SIMDe, commented as following:
ARM C Language Extensions (ACLE) has
__pld
and, in 1.1+,__pldx
. VS is the only ARM compiler I'm aware of targeting ARM which doesn’t support ACLE.
@howjmay, you should check if the generated code with __builtin_prefetch
is identical to the counterpart with __pld
or __pldx
.