Tensor Cores
lostmsu opened this issue · comments
A way to utilize tensor cores is needed, which should draw from the family of VectorXXX
intrinsics in .NET and/or Vulkan Cooperative Matrix extension proposed by NVidia.
Related CUDA documentation: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions
This is also mentioned in #923 , but the later is more about the support for shorter floats in general.
Thanks a lot for your feature request. Given the performance improvements that can be achieved using Tensor Cores on NVIDIA hardware, it definitely makes sense to add support for Tensor Cores in 2.0 (which is going to be the next big release after v1.5).