Implement `__rdtsc`
Cuda-Chen opened this issue · comments
Currently I am implementing the _rdtsc
Intel intrinsic function.
I confirm I follow the instruction of adding test case.
However, whenever I tried to make check
on Intel platform, I always receive the following compile error:
$ make check
g++ -o tests/binding.o -Wall -Wcast-qual -I. -maes -mpclmul -mssse3 -msse4.2 -std=gnu++14 -c -MMD -MF tests/binding.o.d tests/binding.cpp
g++ -o tests/common.o -Wall -Wcast-qual -I. -maes -mpclmul -mssse3 -msse4.2 -std=gnu++14 -c -MMD -MF tests/common.o.d tests/common.cpp
g++ -o tests/impl.o -Wall -Wcast-qual -I. -maes -mpclmul -mssse3 -msse4.2 -std=gnu++14 -c -MMD -MF tests/impl.o.d tests/impl.cpp
tests/impl.cpp: In function ‘SSE2NEON::result_t SSE2NEON::test_rdtsc(const SSE2NEON::SSE2NEONTestImpl&, uint32_t)’:
tests/impl.cpp:9402:22: error: ‘_rdtsc’ was not declared in this scope; did you mean ‘it_rdtsc’?
9402 | uint64_t start = _rdtsc();
| ^~~~~~
| it_rdtsc
For your convenience, I attach the changed I have made:
- sse2neon.h
// _rdtsc declaration
FORCE_INLINE uint64_t _rdtsc(void);
...
// _rdtsc definition
#if defined(__x86_64__) || defined(__i386__)
FORCE_INLINE uint64_t _rdtsc(void)
{
unsigned hi, lo;
__asm__ __volatile__("rdtsc" : "=a"(lo), "=d"(hi));
return ((uint64_t) lo) | (((uint64_t) hi) << 32);
}
#elif defined(__aarch64__)
FORCE_INLINE uint64_t _rdtsc(void)
{
uint64_t val;
/* According to ARM DDI 0487F.c, from Armv8.0 to Armv8.5 inclusive, the
* system counter is at least 56 bits wide; from Armv8.6, the counter
* must be 64 bits wide. So the system counter could be less than 64
* bits wide and it is attributed with the flag 'cap_user_time_short'
* is true.
*/
asm volatile("mrs %0, cntvct_el0" : "=r"(val));
return val;
}
#endif
- tests/impl.h
#define INTRIN_FOREACH(TYPE) \
...
TYPE(rdtsc) \
- tests/impl.cpp
result_t test_rdtsc(const SSE2NEONTestImpl &impl, uint32_t iter)
{
#if defined(__arm__) && __ARM_ARCH == 7
return TEST_UNIMPL;
#endif
uint64_t start = _rdtsc();
for (int i = 0; i < 100000; i++)
;
uint64_t end = _rdtsc();
return end > start ? TEST_SUCCESS : TEST_FAIL;
}
At last, thanks for your help!
You don't have to implement _rdtsc
for x86/x86-64 since the intrinsic should be available via the inclusion of <x86intrin.h>
. Instead, header sse2neon.h
should provide Arm/Aarch64 counterpart.
For ARMv7-A implementation of _rdtsc
, you can check gperftools/src/base/cycleclock.h. Quoted:
V7 is the earliest arch that has a standard cyclecount
Related discussions: https://stackoverflow.com/questions/40454157/is-there-an-equivalent-instruction-to-rdtsc-in-arm
Currently I am porting _rdtsc
x86 intrinsic onto ARMv7.
On ARMv7 platform, usually we can access PMCCNTR
to get cycle count. In order to access this register, the program has to run in PL1 or high mode, or running in user mode when PMUSERENR
.EN == 1. However, the PMUSERENR
is set to zero in the test suite qemu environment and I can't change the value because the test suite qemu environment is running in user mode.
As such, I come up with the following two solutions, and I would like to know which solution is acceptable to this project:
- Set test suite qemu environment to privilege mode.
- Fallback to call syscall such as
gettimeofday()
if we can't set the value ofPMUSERENR
(in Linux kernel, this kind of syscall is able to accessPMCCNTR
).
As such, I come up with the following two solutions, and I would like to know which solution is acceptable to this project:
- Set test suite qemu environment to privilege mode.
- Fallback to call syscall such as
gettimeofday()
if we can't set the value ofPMUSERENR
(in Linux kernel, this kind of syscall is able to accessPMCCNTR
).
For Armv7-A targets, we can provide the OS-assisted fallback at first glance. Then, further exploration would be beneficial.