Segmentation fault when setting nn.Parameter

Question

Segmentation fault when setting nn.Parameter

abcamiletto opened this issue 8 months ago · comments

Andrea Boscolo Camiletto commented 8 months ago

🐛 Bug

A weird segmentation fault can arise when optimizing a function that goes through a nn.Parameter.

The problem may be related to theseus itself or rather to torch.vmap, but this is quite hard to debug.

I found a workaround but figured I would post the reproducible bug anyway.

import theseus as th
import torch
import torch.nn as nn


class Q(nn.Module):
    def __init__(self, q):
        super().__init__()
        self._q = nn.Parameter(q)

    @property
    def q(self):
        return self._q

    def set_q(self, q):
        self._q.data = q


q_tensor = torch.ones(1, 3) * 0.5
verbose = True

qobj = Q(q_tensor)

# Define Theseus variables
q = th.Vector(3, name="q")

def quad_error_fn(optim_vars, aux_vars):
    # Unpack variables
    q = optim_vars[0]
    qobj.set_q(q.tensor)
    return qobj.q


optim_vars = (q,)
aux_vars = ()
dimension = 3
cost_function = th.AutoDiffCostFunction(
    optim_vars, quad_error_fn, dimension, aux_vars=aux_vars, name="cost_function"
)
objective = th.Objective()

objective.add(cost_function)
optimizer = th.LevenbergMarquardt(objective)
theseus_optim = th.TheseusLayer(optimizer)

theseus_inputs = {
    "q": q_tensor,
}

updated_inputs, info = theseus_optim.forward(
    theseus_inputs, optimizer_kwargs={"track_best_solution": verbose, "verbose": verbose}
)

Thanks in advance for looking into this and for the fantastic library!

Luis Pineda · Answer 1 · Thu Jan 11 2024 02:09:07 GMT+0800 (China Standard Time)

Hi @abcamiletto, finally got some time to look into this. This seems to be a problem with vmap, but I'm actually a bit confused about what you are trying to do in this example. Why the need to modify the parameter's .data value directly? The typical use case is that one keeps this constant during torch's forward pass, then torch modifies the parameters' values in the backward pass.

Andrea Boscolo Camiletto · Answer 2 · Thu Jan 11 2024 02:30:24 GMT+0800 (China Standard Time)

I do agree this example does not really make sense.

But imagine a more complex component where the variable we want to optimize is a nn.Parameter of a nn.Module.
This can happen in many ways either with custom modules or off the shelf ones too (eg nn.Conv2d).

The first solution one (or me at least) would think of is plugging in and out this parameter into the component at the beginning and end of the objective function (here the .data assignment) and use the component as one would always do.

This of course cannot work, due to the theseus reliance on torch.vmap. The workaround is indeed very simple in most cases (use the functional F.conv2d instead of the nn.Module), but not straightforward given the errors that are raised when the first try is attempted.

That was of course my experience, and I had to go through the theseus internals a bit to understand what was happening.

Luis Pineda · Answer 3 · Thu Jan 11 2024 03:25:30 GMT+0800 (China Standard Time)

Thanks for your response @abcamiletto. To be sure I understand, is your claim that using an nn.Parameter as an optimization variable results in error when using vmap?

Luis Pineda · Answer 4 · Thu Jan 11 2024 03:30:35 GMT+0800 (China Standard Time)

BTW, using vmap is not mandatory. You can set autograd_mode="dense" in the AutiDiffCostFunction to avoid using vmap (but it will probably be much slower).

That being said, I'm still curious about your use case.

Andrea Boscolo Camiletto · Answer 5 · Thu Jan 11 2024 03:33:54 GMT+0800 (China Standard Time)

Thanks for your response @abcamiletto. To be sure I understand, is your claim that using an nn.Parameter as an optimization variable results in error when using vmap?

Yes, in fact the example that I posted crashes with some obscure error.

Andrea Boscolo Camiletto · Answer 6 · Thu Jan 11 2024 03:39:58 GMT+0800 (China Standard Time)

BTW, using vmap is not mandatory. You can set autograd_mode="dense" in the AutiDiffCostFunction to avoid using vmap (but it will probably be much slower).

That being said, I'm still curious about your use case.

That is interesting. Is autograd_mode=dense making a simple for loop? Or where is the speed loss coming from? @luisenp

I am developing a robotics library to do some kinematic and dynamic computations differentiably. For the sake of a simpler API, the joints are nn.Module and keep their value as nn.Parameter.

This allows me to hide the complexity of joint parametrization to the external world (Euler angles? quaternion? 6D rotation? doesn't matter), but it also broke down when I tried to use theseus with it at first. I had to redesign it such that the joint value could also be given from the "outside".

As a last note: I really am a big fan of this library. I think it has great potential and plenty of applications. Thank you so much for developing it.

Luis Pineda · Answer 7 · Thu Jan 11 2024 04:19:46 GMT+0800 (China Standard Time)

Thanks for you kind words, glad you like the library!

Setting autograd_mode="dense" relies on the original torch.autograd.functional.jacobian implementation, which unnecessarily computes gradients over the batch dimension that we then have to filter out (while vmap directly computes per-sample gradients); this is why we call it "dense". And we also have a version that loops over the batch, which you can use with autograd_mode="loop_batch". Both are much slower than using "vmap".

One part of the nn.Parameter use case I'm confused about is that it looks like you are optimizing the same set of tensors at both optimization levels (outer level with torch, inner level with theseus), which means both optimizers are competing with each other, and this seems a bit strange. This is not a case we have ever considered. In all of our examples, nn.Parameter optimized by torch (e.g., via Adam) are used either as auxiliary variables, or secondary quantities used to compute cost functions and cost weights. For this use case, we have tested extensively with very complicated nn.Module objects as part of the cost function computation without major issues.

Andrea Boscolo Camiletto · Answer 8 · Thu Jan 11 2024 05:18:43 GMT+0800 (China Standard Time)

I see; now I get the point. Thanks for the answer.

In my application, I am not optimizing over the same set of tensors at the two levels. It just so happens that, due to how the library was designed, the quantity I want to optimize is stored as a nn.Parameter. This made things a bit impractical (and raised very confusing errors).