lliuz / ARFlow

The official PyTorch implementation of the paper "Learning by Analogy: Reliable Supervision from Transformations for Unsupervised Optical Flow Estimation".

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Illegal memory access during back propagation unit test

5had3z opened this issue · comments

Hi, I am having issues running correlaton_native.py during the backward phase:
RuntimeError: CUDA error: an illegal memory access was encountered
I first modified your implementation to update it to PyTorch 1.6.0 and ran into this issue.
So then I tried to use your docker file, however jonathonf removed his python3.6 repository for ubuntu 16.04. Consequently I made the following changes to the docker file:

FROM nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04
RUN pip3 install https://download.pytorch.org/whl/cu100/torch-1.1.0-cp36-cp36m-linux_x86_64.whl

This still resulted in errors during the backpropation stages, specifically during:
correlation_backward_input1
correlation_backward_input2

I tried printing the dims to make sure the tensor shapes were correct in some of the functions:
(Pytorch backward)
Grad Output torch.Size([4, 81, 120, 120])
Input Dims torch.Size([4, 256, 128, 128])

(correlation_backward_cuda)
Input batch: 4 ch: 256 h: 128 w: 128

(correlation_backward_cuda_kernel, after channels_first calls)
rInput batch: 4 ch: 128 h: 128 w: 256
gradInput1 batch: 4 ch: 256 h: 128 w: 128
gradOutput batch: 4 ch: 81 h: 120 w: 120

Any idea where the issue is arising from? Is there a subtle difference in changing CUDA9->10 in the docker image?

If I change the max_displacement to 1 or 2 and C=H=W=64, it works fine.
But if I have C=H=W=128 it doesn't work (with max_displacement 1 or 2)

This is an issue about correlation_cuda package. I am not very familiar with cuda programming, so I may not be able to help you to solve this problem.

If you have trouble with this package during training, you can alternatively use my PyTorch implementation (It is correct although kind of slower.)

Since the correlation_cuda package is widely used in other projects, such as ClementPinard/Pytorch-Correlation-extension, NVIDIA/flownet2-pytorch, you can refer to these repos for help.

From some testing that I did, there are access requests for index -1 during some operations.
correlaiton forward kernel during the element wise product sum: prod_sum += rInput1*rInput2
And in correlation_backward_input1 when reading from rInput2

In my own code I skip these operations during a boundary check and consequently don't have this issue anymore.

Thanks for sharing and I'm glad you could find a workaround in the end!

Hi, I have met the same issue as you. May I ask how do you use boundary check to skip these operations you mentioned above? @5had3z

@sun0215 I've got the checks in my re-implementation but they're commented out as it turns out it this issue is due to insufficient padding (the pad_size variable). You won't access out of bounds if this is large enough, just search for the smallest number that works for you.

Bound checks are commented out as cuda cores are dumb afaik, there's no branch predicting, so you're paying the full cost of these checks, hence why I've commented out, but still there for future reference.