关于 07/03_prefetch/06 运行结果的疑问
rickif opened this issue · comments
ricky commented
hi, 小彭老师好。关于 07/03_prefetch/06 例子运行结果我有一些疑问,望指正。
我的平台是 Intel i5-13500, Ubuntu 24.04, gcc version 13.2.0
在运行 07/03_prefetch/06 这个例子时,
去掉例子中的 #pragma omp parallel for 才能得到与课程中类似的结果。我不清楚 #pragma omp parallel for 是否除了并行之外还有其他的优化?
原始版本运行结果
从运行结果可以看到,BM_write_stream_then_read 跟 BM_write_streamed 运行耗时相近,似乎读对 stream 指令并没有影响
-----------------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------------
BM_read 25228152 ns 18180668 ns 38
BM_write 32696238 ns 25309548 ns 33
BM_write_streamed 19530899 ns 17132181 ns 36
BM_write_stream_then_read 19586335 ns 17525509 ns 43
BM_write_streamed_ps 19550735 ns 14485110 ns 39
BM_write_streamed_ps_skipped 37094026 ns 26238143 ns 26
BM_read_and_write 36829027 ns 33520956 ns 22
去除 #pragma omp parallel for 版本运行结果
从运行结果可以看到,BM_write_stream_then_read 运行耗时显著比 BM_write_streamed 长
-----------------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------------
BM_read 38213301 ns 38207623 ns 19
BM_write 52209723 ns 52203705 ns 13
BM_write_streamed 34738316 ns 34735390 ns 20
BM_write_stream_then_read 40930259 ns 40927256 ns 17
BM_write_streamed_ps 17725541 ns 17724305 ns 36
BM_write_streamed_ps_skipped 36891533 ns 36889477 ns 19
BM_read_and_write 44972351 ns 44969916 ns 12
彭于斌 commented
类似在哪里?数值上的类似没有意义,比值才重要。我是9700,性能远低于你,导致你需要取消了并行后才能和我一样,很正常。
无法顺畅的大口呼吸,是活着的最好证明
…---原始邮件---
发件人: ***@***.***>
发送时间: 2024年6月9日(周日) 晚上8:09
收件人: ***@***.***>;
抄送: ***@***.***>;
主题: [parallel101/course] 关于 07/03_prefetch/06 运行结果的疑问 (Issue #31)
hi, 小彭老师好。关于 07/03_prefetch/06 例子运行结果我有一些疑问,望指正。
我的平台是 Intel i5-13500, Ubuntu 24.04, gcc version 13.2.0
在运行 07/03_prefetch/06 这个例子时,
去掉例子中的 #pragma omp parallel for 才能得到与课程中类似的结果。我不清楚 #pragma omp parallel for 是否除了并行之外还有其他的优化?
原始版本运行结果
----------------------------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------------------------- BM_read 25228152 ns 18180668 ns 38 BM_write 32696238 ns 25309548 ns 33 BM_write_streamed 19530899 ns 17132181 ns 36 BM_write_stream_then_read 19586335 ns 17525509 ns 43 BM_write_streamed_ps 19550735 ns 14485110 ns 39 BM_write_streamed_ps_skipped 37094026 ns 26238143 ns 26 BM_read_and_write 36829027 ns 33520956 ns 22
image.png (view on web)
去除 #pragma omp parallel for 版本运行结果
----------------------------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------------------------- BM_read 38213301 ns 38207623 ns 19 BM_write 52209723 ns 52203705 ns 13 BM_write_streamed 34738316 ns 34735390 ns 20 BM_write_stream_then_read 40930259 ns 40927256 ns 17 BM_write_streamed_ps 17725541 ns 17724305 ns 36 BM_write_streamed_ps_skipped 36891533 ns 36889477 ns 19 BM_read_and_write 44972351 ns 44969916 ns 12
image.png (view on web)
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
ricky commented
从运行结果可以看到,BM_write_stream_then_read 跟 BM_write_streamed 运行耗时相近,似乎读对 steam 指令并没有影响
- 从原始版本运行结果看,BM_write_stream_then_read 跟 BM_write_streamed 运行耗时相近,似乎读对 steam 指令并没有影响
- 从删除omp parrallel for 版本运行结果看,BM_write_stream_then_read 运行耗时显著比 BM_write_streamed 长