[FEA] Consider custom reduction if using non-contiguous stride
cliffburdick opened this issue · comments
Cliff Burdick commented
CUB does not have the ability to optimize a reduction with a non-contiguous stride, and instead relies on iterators for this. We still have custom reduction code that may be faster than CUB in this case. Investigating switching to that code if we detect this case.
Cliff Burdick commented
Duplicate of #482