pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Home Page:https://pytorch.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Grad strides do not match bucket view strides.

xingxinggui opened this issue · comments

[W reducer.cpp:313] Warning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed. This is not an error, but may impair performance.
grad.sizes() = [1024, 1024, 1, 1], strides() = [1024, 1, 1024, 1024]
bucket_view.sizes() = [1024, 1024, 1, 1], strides() = [1024, 1, 1, 1] (function operator())

this problem impair performance .
what can i do?

cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @SciPioneer @H-Huang @VitalyFedyunin @jamesr66a @ppwwyyxx

[W reducer.cpp:313] Warning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed. This is not an error, but may impair performance.
grad.sizes() = [1024, 1024, 1, 1], strides() = [1024, 1, 1024, 1024]
bucket_view.sizes() = [1024, 1024, 1, 1], strides() = [1024, 1, 1, 1] (function operator())

this problem impair performance .
what can i do?

my code is cnn
nn.Conv2d(1024, 1024, 1)

[W reducer.cpp:313] Warning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed. This is not an error, but may impair performance.
grad.sizes() = [1024, 1024, 1, 1], strides() = [1024, 1, 1024, 1024]
bucket_view.sizes() = [1024, 1024, 1, 1], strides() = [1024, 1, 1, 1] (function operator())
this problem impair performance .
what can i do?

my code is cnn
nn.Conv2d(1024, 1024, 1)

I use DDP

cc @mcarilli. In this case it seems the warning is spurious, strides are nominally different but physical layout is the same.

commented

I met the problem in distributed training with bacthsize > 1.
In batchsize = 1 or single gpu, it won't occur.
I thought the problem caused by transpose or permute.
When I delete the transpose or permute, or add .contiguous() after above function, it goes work.
Thus I suspect transpose caused the gradients stride wrong. Need to be contigous.

Before:

 vit_ll = eval('self.vit'+str(i)).forward_wo_cls(x).transpose(1,2).view([x.shape[0],-1,*x.shape[2:]])

after: (fixed)

vit_ll = eval('self.vit'+str(i)).forward_wo_cls(x).transpose(1,2).contiguous().view([x.shape[0],-1,*x.shape[2:]])

Agree with @starhiking , tensors should be contiguous once their views have been changed. I also solved my problem in similar way. Looking through this https://pytorch.org/docs/stable/tensor_view.html doc might be very helpful.

@starhiking, When I use 1*1 convolution kernel, it also happened. But why?

@MRI000000 , I also meet the issue, do you have resolved it?

It may made by distributed training.

commented

I get this issue when using channels_last training, and the optimizer was defined before I switched the model over to channels_last.

I get this issue when using U-Net and trying to set BatchNorm2d in TransposeConv

Is there an update on it @rohan-varma (tagged you since you removed the triaged tage)? I am facing the same issue but only with DDP - otherwise the code runs through without any issues.

commented

I'm seeing this warning too, though the model seems to be running/converging okay.