[FEA] Allow stride 0 batched GEMMs
cliffburdick opened this issue · comments
cuBLAS allows a batch stride of 0 on A or B so one or both matrices don't need to be repeated in memory. Use this feature if needed.
An efficient C++17 GPU numerical computing library with Python-like syntax
cliffburdick opened this issue · comments
cuBLAS allows a batch stride of 0 on A or B so one or both matrices don't need to be repeated in memory. Use this feature if needed.