NVIDIA / MatX

An efficient C++17 GPU numerical computing library with Python-like syntax

Home Page:https://nvidia.github.io/MatX

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[FEA] Consider custom reduction if using non-contiguous stride

cliffburdick opened this issue · comments

CUB does not have the ability to optimize a reduction with a non-contiguous stride, and instead relies on iterators for this. We still have custom reduction code that may be faster than CUB in this case. Investigating switching to that code if we detect this case.