ANLAB-KAIST / NBA

Network Balancing Act: A High-performance packet processing framework for heterogeneous processors

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Use fast performance counter to measure clock cycles

achimnol opened this issue · comments

I performed a small experiment, running each timing functions 1M times on a Sandy Bridge server:

clock_gettime(CLOCK_MONOTONIC)          0.031977 sec
clock_gettime(CLOCK_MONOTONIC_RAW)      0.518126 sec
clock_gettime(CLOCK_MONOTONIC_COARSE)   0.007629 sec
clock_gettime(CLOCK_PROCESS_CPUTIME_ID) 0.648347 sec
clock_gettime(CLOCK_THREAD_CPUTIME_ID)  0.580868 sec
gettimeofday()                          0.032326 sec
rdpmc()                                 0.019933 sec
rdpmc() + memfence()                    0.029345 sec
rdtsc()                                 0.010432 sec
rdtsc() + memfence()                    0.018088 sec
rdtsc() + cpuid()                       0.052463 sec

The result shows some potential improvements in our timing functions in lib/common.hh.

  • The current get_usec() uses CLOCK_MONOTONIC_RAW, but we should change it to CLOCK_MONOTONIC, which is a way faster.
  • CLOCK_MONOTONIC_COARSE is the fastest, but its resolution is only about 4 ms.
  • For measuring the PPC values, we have used rdtsc() + cpuid() combination before, but we need to avoid using cpuid() to prevent out-of-order execution. Instead, we should use a lighter synchronization mechanisms such as memory fence.