DLTcollab / sse2neon

A translator from Intel SSE intrinsics to Arm/Aarch64 NEON implementation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Implement `__rdtsc`

Cuda-Chen opened this issue · comments

Currently I am implementing the _rdtsc Intel intrinsic function.
I confirm I follow the instruction of adding test case.
However, whenever I tried to make check on Intel platform, I always receive the following compile error:

$ make check
g++ -o tests/binding.o -Wall -Wcast-qual -I. -maes -mpclmul -mssse3 -msse4.2 -std=gnu++14 -c -MMD -MF tests/binding.o.d tests/binding.cpp
g++ -o tests/common.o -Wall -Wcast-qual -I. -maes -mpclmul -mssse3 -msse4.2 -std=gnu++14 -c -MMD -MF tests/common.o.d tests/common.cpp
g++ -o tests/impl.o -Wall -Wcast-qual -I. -maes -mpclmul -mssse3 -msse4.2 -std=gnu++14 -c -MMD -MF tests/impl.o.d tests/impl.cpp
tests/impl.cpp: In function ‘SSE2NEON::result_t SSE2NEON::test_rdtsc(const SSE2NEON::SSE2NEONTestImpl&, uint32_t)’:
tests/impl.cpp:9402:22: error: ‘_rdtsc’ was not declared in this scope; did you mean ‘it_rdtsc’?
 9402 |     uint64_t start = _rdtsc();
      |                      ^~~~~~
      |                      it_rdtsc

For your convenience, I attach the changed I have made:

  • sse2neon.h
// _rdtsc declaration
FORCE_INLINE uint64_t _rdtsc(void);
...
// _rdtsc definition
#if defined(__x86_64__) || defined(__i386__)
FORCE_INLINE uint64_t _rdtsc(void)
{
    unsigned hi, lo;
    __asm__ __volatile__("rdtsc" : "=a"(lo), "=d"(hi));
    return ((uint64_t) lo) | (((uint64_t) hi) << 32); 
}

#elif defined(__aarch64__)
FORCE_INLINE uint64_t _rdtsc(void)
{
    uint64_t val; 

    /* According to ARM DDI 0487F.c, from Armv8.0 to Armv8.5 inclusive, the
     * system counter is at least 56 bits wide; from Armv8.6, the counter
     * must be 64 bits wide.  So the system counter could be less than 64
     * bits wide and it is attributed with the flag 'cap_user_time_short'
     * is true.
     */
    asm volatile("mrs %0, cntvct_el0" : "=r"(val));

    return val; 
}
#endif
  • tests/impl.h
#define INTRIN_FOREACH(TYPE)         \
...
    TYPE(rdtsc)                      \
  • tests/impl.cpp
result_t test_rdtsc(const SSE2NEONTestImpl &impl, uint32_t iter)
{
#if defined(__arm__) && __ARM_ARCH == 7
    return TEST_UNIMPL;
#endif

    uint64_t start = _rdtsc();
    for (int i = 0; i < 100000; i++) 
        ;
    uint64_t end = _rdtsc();
    return end > start ? TEST_SUCCESS : TEST_FAIL;
}

At last, thanks for your help!

You don't have to implement _rdtsc for x86/x86-64 since the intrinsic should be available via the inclusion of <x86intrin.h>. Instead, header sse2neon.h should provide Arm/Aarch64 counterpart.

For ARMv7-A implementation of _rdtsc, you can check gperftools/src/base/cycleclock.h. Quoted:

V7 is the earliest arch that has a standard cyclecount

Related discussions: https://stackoverflow.com/questions/40454157/is-there-an-equivalent-instruction-to-rdtsc-in-arm

Thanks @jserv help, and I will work on ARMv7-A part!

Currently I am porting _rdtsc x86 intrinsic onto ARMv7.
On ARMv7 platform, usually we can access PMCCNTR to get cycle count. In order to access this register, the program has to run in PL1 or high mode, or running in user mode when PMUSERENR.EN == 1. However, the PMUSERENR is set to zero in the test suite qemu environment and I can't change the value because the test suite qemu environment is running in user mode.

As such, I come up with the following two solutions, and I would like to know which solution is acceptable to this project:

  1. Set test suite qemu environment to privilege mode.
  2. Fallback to call syscall such as gettimeofday() if we can't set the value of PMUSERENR (in Linux kernel, this kind of syscall is able to access PMCCNTR).

As such, I come up with the following two solutions, and I would like to know which solution is acceptable to this project:

  1. Set test suite qemu environment to privilege mode.
  2. Fallback to call syscall such as gettimeofday() if we can't set the value of PMUSERENR (in Linux kernel, this kind of syscall is able to access PMCCNTR).

For Armv7-A targets, we can provide the OS-assisted fallback at first glance. Then, further exploration would be beneficial.