dual vs simple random read tests

Question

dual vs simple random read tests

Blimpyway opened this issue 6 years ago · comments

Hi,

I do not understand the short message stating

Note 2: Dual random read means that we are simultaneously performing ==
== two independent memory accesses at a time. In the case if ==
== the memory subsystem can't handle multiple outstanding ==
== requests, dual random read has the same timings as two ==
== single reads performed one after another. ==

Why the memory subsystem does not optimize two consecutive "simple" reads on random_read_test() while dual random read call gives the same latency for two readings together instead of one?

Just because zerobuffer[] reads from v1 and v2 indexes are lined one next each other? (at lines 342 and 343 in main.c)

I'm asking this since on a i5 laptop random read latency is only slightly (15%) better than dual random read latency.

Thanks

Siarhei Siamashka · Answer 1 · Thu Nov 08 2018 12:50:09 GMT+0800 (China Standard Time)

The memory subsystem can't optimize two consecutive "simple" reads because the address used by the second read is calculated from the value obtained from the first read. So the second read can't start before the first read is completed.

And the latency difference between these two methods is exactly what the test is trying to measure. Here is an example of a primitive processor which can't handle multiple outstanding requests: https://github.com/ssvb/tinymembench/wiki/Samsung-N220-(Intel-Atom-N450)

Your i5 processor is doing just fine.

Blimpyway · Answer 2 · Thu Nov 08 2018 19:30:20 GMT+0800 (China Standard Time)

Ok, that makes sense. I changed the second array index to depend on input from first and it can't optimize anymore. Thanks.

…

On 11/8/18, Siarhei Siamashka ***@***.***> wrote: The memory subsystem can't optimize two consecutive "simple" reads because the address used by the second read is calculated from the value obtained from the first read. So the second read can't start before the first read is completely done. And the latency difference between these two methods is exactly what the test is trying to measure. Here is an example of a primitive processor which can't handle multiple outstanding requests: https://github.com/ssvb/tinymembench/wiki/Samsung-N220-(Intel-Atom-N450) Your i5 processor is doing just fine. -- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: #19 (comment)