[FEA] Consider custom reduction if using non-contiguous stride

Question

[FEA] Consider custom reduction if using non-contiguous stride

cliffburdick opened this issue 7 months ago · comments

CUB does not have the ability to optimize a reduction with a non-contiguous stride, and instead relies on iterators for this. We still have custom reduction code that may be faster than CUB in this case. Investigating switching to that code if we detect this case.

Cliff Burdick · Answer 1 · Sat Jul 27 2024 03:27:31 GMT+0800 (China Standard Time)

Duplicate of #482