NVIDIA / warp

The current matmul and batched_matmul implementations break the gradient flow, as an example, in line

Line 3177 in 79a56a9

ctypes.c_void_p(a.ptr),

one should change a.ptr with ctypes.c_void_p(adj_a.ptr) and beta should set to 1. In the current implementation, old gradient of a is overwritten by matmul call, similar issue holds for b, c, and d.

Hi @moradza, thanks for the report - @daedalus5 can you take a look?

Fixed by 9ae3877

Adjoint of Matmul