NVIDIA / cub

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Documentation of warp-wide collectives refers to `__syncthreads` instead of `__syncwarp`

fkallen opened this issue · comments

For example in https://nvlabs.github.io/cub/class_warp_exchange.html#a078092b662bf8cdc67c2322d71f0a776

A subsequent __syncthreads() threadblock barrier should be invoked after calling this method if the collective's temporary storage (e.g., temp_storage) is to be reused or repurposed.

@fkallen thank you for reporting this! We should fix this after the following PR is merged.