parallel101 / course

高性能并行编程与优化 - 课件

Home Page:https://space.bilibili.com/263032155

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

关于 07/03_prefetch/06 运行结果的疑问

rickif opened this issue · comments

hi, 小彭老师好。关于 07/03_prefetch/06 例子运行结果我有一些疑问,望指正。
我的平台是 Intel i5-13500, Ubuntu 24.04, gcc version 13.2.0
在运行 07/03_prefetch/06 这个例子时,
去掉例子中的 #pragma omp parallel for 才能得到与课程中类似的结果。我不清楚 #pragma omp parallel for 是否除了并行之外还有其他的优化?

原始版本运行结果

从运行结果可以看到,BM_write_stream_then_read 跟 BM_write_streamed 运行耗时相近,似乎读对 stream 指令并没有影响

-----------------------------------------------------------------------
Benchmark                             Time             CPU   Iterations
-----------------------------------------------------------------------
BM_read                        25228152 ns     18180668 ns           38
BM_write                       32696238 ns     25309548 ns           33
BM_write_streamed              19530899 ns     17132181 ns           36
BM_write_stream_then_read      19586335 ns     17525509 ns           43
BM_write_streamed_ps           19550735 ns     14485110 ns           39
BM_write_streamed_ps_skipped   37094026 ns     26238143 ns           26
BM_read_and_write              36829027 ns     33520956 ns           22

image

去除 #pragma omp parallel for 版本运行结果

从运行结果可以看到,BM_write_stream_then_read 运行耗时显著比 BM_write_streamed 长

-----------------------------------------------------------------------
Benchmark                             Time             CPU   Iterations
-----------------------------------------------------------------------
BM_read                        38213301 ns     38207623 ns           19
BM_write                       52209723 ns     52203705 ns           13
BM_write_streamed              34738316 ns     34735390 ns           20
BM_write_stream_then_read      40930259 ns     40927256 ns           17
BM_write_streamed_ps           17725541 ns     17724305 ns           36
BM_write_streamed_ps_skipped   36891533 ns     36889477 ns           19
BM_read_and_write              44972351 ns     44969916 ns           12

image

从运行结果可以看到,BM_write_stream_then_read 跟 BM_write_streamed 运行耗时相近,似乎读对 steam 指令并没有影响

  1. 从原始版本运行结果看,BM_write_stream_then_read 跟 BM_write_streamed 运行耗时相近,似乎读对 steam 指令并没有影响
  2. 从删除omp parrallel for 版本运行结果看,BM_write_stream_then_read 运行耗时显著比 BM_write_streamed 长