DLTcollab / sse2neon

A translator from Intel SSE intrinsics to Arm/Aarch64 NEON implementation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Consolidate `_mm_prefetch`

jserv opened this issue · comments

Current _mm_prefetch does not behave as Intel documentation states:

Fetch the line of data from memory that contains address p to a location in the cache heirarchy specified by the locality hint i.

We shall consolidate:

  1. Refine the function prototype. i.e., void _mm_prefetch(char const *p, int i)
  2. Provide the corresponding test cases. See test/x86/sse.c (Function test_simde_mm_prefetch)
  3. Properly manipulate the locality hint.

The implementation from SIMDe:

void simde_mm_prefetch (const void* p, int i) {
    switch(i) {
      case SIMDE_MM_HINT_NTA:
        __builtin_prefetch(p, 0, 0);
        break;
      case SIMDE_MM_HINT_T0:
        __builtin_prefetch(p, 0, 3);
        break;
      case SIMDE_MM_HINT_T1:
        __builtin_prefetch(p, 0, 2);
        break;
      case SIMDE_MM_HINT_T2:
        __builtin_prefetch(p, 0, 1);
        break;
      case SIMDE_MM_HINT_ENTA:
        __builtin_prefetch(p, 1, 0);
        break;
      case SIMDE_MM_HINT_ET0:
        __builtin_prefetch(p, 1, 3);
        break;
      case SIMDE_MM_HINT_ET1:
        __builtin_prefetch(p, 1, 2);
        break;
      case SIMDE_MM_HINT_ET2:
        __builtin_prefetch(p, 0, 1);
        break;
    }
}

Reference: SIMDe Issue #897.

Evan Nemerson, the original author of SIMDe, commented as following:

ARM C Language Extensions (ACLE) has __pld and, in 1.1+, __pldx. VS is the only ARM compiler I'm aware of targeting ARM which doesn’t support ACLE.

@howjmay, you should check if the generated code with __builtin_prefetch is identical to the counterpart with __pld or __pldx.