Use fast performance counter to measure clock cycles
achimnol opened this issue · comments
Joongi Kim commented
I performed a small experiment, running each timing functions 1M times on a Sandy Bridge server:
clock_gettime(CLOCK_MONOTONIC) 0.031977 sec
clock_gettime(CLOCK_MONOTONIC_RAW) 0.518126 sec
clock_gettime(CLOCK_MONOTONIC_COARSE) 0.007629 sec
clock_gettime(CLOCK_PROCESS_CPUTIME_ID) 0.648347 sec
clock_gettime(CLOCK_THREAD_CPUTIME_ID) 0.580868 sec
gettimeofday() 0.032326 sec
rdpmc() 0.019933 sec
rdpmc() + memfence() 0.029345 sec
rdtsc() 0.010432 sec
rdtsc() + memfence() 0.018088 sec
rdtsc() + cpuid() 0.052463 sec
The result shows some potential improvements in our timing functions in lib/common.hh
.
- The current
get_usec()
usesCLOCK_MONOTONIC_RAW
, but we should change it toCLOCK_MONOTONIC
, which is a way faster. CLOCK_MONOTONIC_COARSE
is the fastest, but its resolution is only about 4 ms.- For measuring the PPC values, we have used
rdtsc() + cpuid()
combination before, but we need to avoid usingcpuid()
to prevent out-of-order execution. Instead, we should use a lighter synchronization mechanisms such as memory fence.