Consolidate `_mm_prefetch`

Question

Consolidate `_mm_prefetch`

jserv opened this issue 2 years ago · comments

Current _mm_prefetch does not behave as Intel documentation states:

Fetch the line of data from memory that contains address p to a location in the cache heirarchy specified by the locality hint i.

We shall consolidate:

Refine the function prototype. i.e., void _mm_prefetch(char const *p, int i)
Provide the corresponding test cases. See test/x86/sse.c (Function test_simde_mm_prefetch)
Properly manipulate the locality hint.

The implementation from SIMDe:

void simde_mm_prefetch (const void* p, int i) {
    switch(i) {
      case SIMDE_MM_HINT_NTA:
        __builtin_prefetch(p, 0, 0);
        break;
      case SIMDE_MM_HINT_T0:
        __builtin_prefetch(p, 0, 3);
        break;
      case SIMDE_MM_HINT_T1:
        __builtin_prefetch(p, 0, 2);
        break;
      case SIMDE_MM_HINT_T2:
        __builtin_prefetch(p, 0, 1);
        break;
      case SIMDE_MM_HINT_ENTA:
        __builtin_prefetch(p, 1, 0);
        break;
      case SIMDE_MM_HINT_ET0:
        __builtin_prefetch(p, 1, 3);
        break;
      case SIMDE_MM_HINT_ET1:
        __builtin_prefetch(p, 1, 2);
        break;
      case SIMDE_MM_HINT_ET2:
        __builtin_prefetch(p, 0, 1);
        break;
    }
}

Reference: SIMDe Issue #897.

Jim Huang · Answer 1 · Sun Oct 30 2022 21:49:23 GMT+0800 (China Standard Time)

Evan Nemerson, the original author of SIMDe, commented as following:

ARM C Language Extensions (ACLE) has __pld and, in 1.1+, __pldx. VS is the only ARM compiler I'm aware of targeting ARM which doesn’t support ACLE.

@howjmay, you should check if the generated code with __builtin_prefetch is identical to the counterpart with __pld or __pldx.