update vkpeak 20230812
nihui opened this issue · comments
https://github.com/nihui/vkpeak/releases/tag/20230812
fp16-matrix value added for all VK_KHR_cooperative_matrix
capable devices, such as rtx20+ and rdna3
It reflects the computing power of tensorcore or similar AI engine on the device
At the moment, all nvidia turing+ devices are known to work
rdna3 device works with the latest windows driver (130Tflops+ measured on my 7900xtx graphic)
In the future, the linux mesa driver will follow up, bring this extension for intel etc.
sample output on nvidia t4
[action@VM-116-181-centos build]$ ./vkpeak 0
device = GRID T4-8C
fp32-scalar = 3823.95 GFLOPS
fp32-vec4 = 3796.63 GFLOPS
fp16-scalar = 3599.11 GFLOPS
fp16-vec4 = 7203.46 GFLOPS
fp16-matrix = 29188.25 GFLOPS
fp64-scalar = 127.15 GFLOPS
fp64-vec4 = 127.13 GFLOPS
int32-scalar = 3667.11 GIOPS
int32-vec4 = 3741.25 GIOPS
int16-scalar = 3707.29 GIOPS
int16-vec4 = 3797.13 GIOPS