NVIDIA / warp

A Python framework for high performance GPU simulation and graphics

Home Page:https://nvidia.github.io/warp/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Calculating the jacobian for n inputs and n outputs

nithinsomu opened this issue · comments

Below is a sample code for some jacobian calculation that I am trying to perform. My jacobian is around 36000 X 36000 ( I have 36000 inputs and 36000 outputs).

import warp as wp
import numpy as np
from tqdm import tqdm
wp.init()

@wp.kernel
def calc_residue(R : wp.array(dtype=float),
U : wp.array(dtype=float)):

i = wp.tid()
if i==0:
	wp.atomic_add(R, i, 0.01 * U[i]**2.0)
else:
    
	wp.atomic_add(R, i, 0.01 * U[i]**2.0 +  + U[i-1] ** 2.0)

R = wp.zeros(36000, dtype=float, requires_grad=True)
U = wp.array(np.random.random(36000), dtype=float, requires_grad=True)

tape = wp.Tape()

with tape:
wp.launch(calc_residue, dim = (36000), inputs=[R, U])
jacobian = np.zeros((36000, 36000))

for output_index in tqdm(range(36000)):
select_index = np.zeros(36000)
select_index[output_index] = 1.0
e = wp.array(select_index, dtype=wp.float32)
# pass input gradients to the output buffer to apply selection
tape.backward(grads={R: e})
q_grad_i = U.grad
jacobian[output_index, :] = q_grad_i.numpy()
tape.zero()

Is there a way to vectorize the for loop for constructing the jacobian? In a more complex case (larger computational graph), it takes me around 5 minutes to assemble the jacobian. Is there a way to cut down this time?

Hi @nithinsomu, if the kernel is fairly simple like in your example it would probably make sense to use an explicit Jacobian construction which would be very fast in this case.

@eric-heiden do you have any other suggestion?

Hi @nithinsomu,

The Jacobian can be computed with just 2 tape.backward() calls because every second input index is independent from each other. See these modifications where I create an output index selection array that skips 2 steps. I printed the true Jacobian for a smaller number of dimensions to figure out the assignment indexing:

np.set_printoptions(precision=3, suppress=True, linewidth=400)

dim = 3600
R = wp.zeros(dim, dtype=float, requires_grad=True)
U = wp.array(np.random.random(dim), dtype=float, requires_grad=True)

tape = wp.Tape()

with tape:
	wp.launch(calc_residue, dim = (dim), inputs=[R, U])
jacobian = np.zeros((dim, dim))

for output_index in tqdm(range(dim)):
	select_index = np.zeros(dim)
	select_index[output_index] = 1.0
	e = wp.array(select_index, dtype=wp.float32)
	# pass input gradients to the output buffer to apply selection
	tape.backward(grads={R: e})
	q_grad_i = U.grad
	jacobian[output_index, :] = q_grad_i.numpy()
	tape.zero()

print(jacobian)
print("\n\n")

new_jacobian = np.zeros((dim, dim))
for output_index in range(2):
    select_index = np.zeros(dim)
    selector = np.arange(output_index, dim, 2)
    select_index[selector] = 1.0
    e = wp.array(select_index, dtype=wp.float32)
    # pass input gradients to the output buffer to apply selection
    tape.backward(grads={R: e})
    q_grad_i = U.grad
    new_jacobian[selector, selector] = q_grad_i.numpy()[selector]
    new_jacobian[(selector+2)%dim, (selector+1)%dim] = q_grad_i.numpy()[(selector+1)%dim]
    tape.zero()

print(new_jacobian)

assert np.allclose(jacobian, new_jacobian)

Got it. Thank you very much for the reply.