ssvb / tinymembench

Simple benchmark for memory throughput and latency

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Minimize the use of floating point arithmetic

Alexandre-M opened this issue · comments

Hi,

We're currently developing on the new ARMADA380 architecture for FreeBSD.
During the checks of the L2 cache activation, we use your tool that is very useful.

However, FreeBSD 10.3 lack the support of hard-float, so all the computations of the memory bandwidth include the soft-float computation. It results bad value (and some stress :) ) and hours to search why the memory is so slow.

Can you look to minimize as possible the use of floating point arithmetic during the bench phases ?

Thank you

Alexandre Martins

But the use of floating point calculations should be already pretty much minimal. They are also mostly done outside of the critical loops, except for the getime() function. Why do you think that it is a problem?

Could you please share your current logs? It is quite possible that the reported values are actually normal. Also as a test, you could try to hack the gettime() function and change it to do twice (or even 10x) more work, then check whether this affects the reported values.

Hi

I made a small patch to check if the soft-float is really performance cost.

patch.txt

We won some Mb/s but not so much.

Feel free to integrate that (or not :) )

How big was the difference? I just don't like integers for this kind of calculations because they tend to overflow. And if it happens, then we get really bogus results.

Edit: BTW, you are using ualarm() in your patch and it is not exactly accurate. See https://linux.die.net/man/3/ualarm

The ualarm() function causes the signal SIGALRM to be sent to
the invoking process after (not less than) usecs microseconds.
The delay may be lengthened slightly by any system activity or
by the time spent processing the call or by the granularity of
system timers. 

The difference was about ~3%. (go from 380 Mb/s to 390 Mb/s)

The fact that ualarm is not accurate is not an issue. In my case, the loop run about 0.6 seconds (far of 0.5 asked) in both case. The timer t1 and t2 give us the true start and stop so the value is still accurate.