DefTruth / CUDA-Learn-Notes

🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.

Home Page:https://github.com/DefTruth/cuda-learn-notes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

您好,请教一个关于代码中reduce相关的问题

Ss-shuang123 opened this issue · comments

commented
  1. sum = warp_reduce_sum<NUM_WARPS>(sum);
  2. if(warp==0) sum = warp_reduce_sum<NUM_WARPS>(sum);

0x03 warp/block reduce sum/max 、0x09 softmax, softmax + vec4
做final sum的时候,用的是第一种形式
0x04 block all reduce + vec4
而用的是第二种形式
我的理解是,最后final sum的时候是不是应该用第二种形式?最后都集中在第一个warp束中。
感谢!

This issue is stale because it has been open for 30 days with no activity.

This issue was closed because it has been inactive for 7 days since being marked as stale.