Potentially wrong backward in radiance_field_reconstruction.ipynb

Question

Potentially wrong backward in radiance_field_reconstruction.ipynb

lkskstlr opened this issue a year ago · comments

Thank you for the nice library and examples :)

I was using the above example to implement NeRF and noticed that the backward pass is probably slightly wrong by comparing gradients of the mi.Loop vs gradients from an unrolled python loop. This is especially clear if using very few samples per ray. The difference is very slight and hence the optimization works, when I change the loop the final loss gets slightly better, but much less than 1%.

Please also check if the change I made is actually correct, as I am never fully certain with mi.Loop :)

Current Code

while loop(active):
    p = ray(t)
    with dr.resume_grad(when=not primal):
        sigmat = self.sigmat.eval(p)[0]
        if self.use_relu:
            sigmat = dr.maximum(sigmat, 0.0)
        tr = dr.exp(-sigmat * step_size)
        # Evaluate the directionally varying emission (weighted by transmittance)
        Le = β * (1.0 - tr) * self.eval_emission(p, ray.d) 
        if not primal:
            dr.backward_from(δL * (L * tr / dr.detach(tr) + Le))
    β *= tr
    L = L + Le if primal else L - Le
    t += step_size
    active &= (t < maxt) & dr.any(dr.neq(β, 0.0))

My Code

while loop(active):
    p = ray(t)
    with dr.resume_grad(when=not primal):
        sigmat = self.sigmat.eval(p)[0]
        if self.use_relu:
            sigmat = dr.maximum(sigmat, 0.0)
        tr = dr.exp(-sigmat * step_size)
        # Evaluate the directionally varying emission (weighted by transmittance)
        Le = β * (1.0 - tr) * self.eval_emission(p, ray.d) 
        
    β *= tr
    L = L + Le if primal else L - Le
    
    with dr.resume_grad(when=not primal):
        if not primal:
            dr.backward_from(δL * (L * tr / dr.detach(tr) + Le))
            
    t += step_size
    active &= (t < maxt) & dr.any(dr.neq(β, 0.0))

Nicolas Roussel · Answer 1 · Mon Jul 24 2023 23:59:31 GMT+0800 (China Standard Time)

Hi @lkskstlr

I'm a bit confused by your message:

is probably slightly wrong by comparing gradients of the mi.Loop vs gradients from an unrolled python loop.

How exactly did you test this? Recording shouldn't change the result of a computation. It's just a matter of performance.

This is especially clear if using very few samples per ray. The difference is very slight and hence the optimization works, when I change the loop the final loss gets slightly better, but much less than 1%.
I might be misunderstanding, so there's a strong difference between a recorded and unrolled loop that is only clearly measurable with few samples?

I do agree with your suggested change, at first sight. But I am curious about your validation, and would like to know more before I make this update to the tutorial.

Lukas Koestler · Answer 2 · Tue Jul 25 2023 14:14:06 GMT+0800 (China Standard Time)

I implemented a python for loop for internal tests of mi mi.Loop and saw that the gradients didn't match. Because the unrolled loop completely uses AD, I am somewhat confident the gradients are correct. I then also did them from hand for a single point, which gave the same as the Python loop (just as a check, this is ofc expected). When I change the mi.Loop as above, the gradients match with the Python loop. It would also be cool to have finite difference checking similar to: https://pytorch.org/docs/stable/generated/torch.autograd.gradgradcheck.html.

Caveat: I did this all for a similar, but not identical mi.Loop that I was using internally. I used this tutorial as the reference for my first implementation and hence opened the issue. I think it would be good if you could verify that the new version is actually correct, because, as said, I am not 100% certain and I am lacking time.

If you have some time to wait, I can probably write the exact test for this code within the next two weeks or so, then you don't have to do it.

Cheers Lukas

Delio Vicini · Answer 3 · Wed Dec 20 2023 23:20:58 GMT+0800 (China Standard Time)

I just saw this now. I think you are right, indeed Le needs to subtracted before computing gradients.

Nicolas Roussel · Answer 4 · Mon Jan 08 2024 22:12:00 GMT+0800 (China Standard Time)

Thanks, I pushed the fix!

🚀