pytorch / glow

On the BatchNorm paper the order of the layers is: Conv + BN + ReLU. And when this order is respected, the fusion of Conv and BN can be performed without any problem. But, when the order is Conv + ReLU + BN the fusion should not be performed as it leads to numerically different results.

As part of the optimizations done in GLOW the Conv gets always fused to the BN, regardless of the ReLU in between, creating errors on the graph output.

This issue need to be addressed, on the meantime I'm wondering: can we pass a compilation flag to do not perform this fusion? I know there are flags, i just could not find the specific one to disable the Conv+BN one.

It same issue as in: #5868

Conv and BN seems to be fused as Relu wouldn't be there, which cause incorrect results.

Please address the issue.

Can you describe more how you think this issue is manifested in Glow optimizations, or show some Glow dot graph before/after optimizations?

AFAICT when we have Conv -> ReLU -> BN, there shouldn't be any fusion of the Conv and BN, because the ReLU blocks it. The optimization only applies when the BN input is a Conv:

glow/lib/Optimizer/GraphOptimizer/GraphOptimizer.cpp

Lines 2428 to 2431 in b542083

    
           auto *CV = dyn_cast<ConvolutionNode>(BN->getInput()); 
        
           if (!CV) { 
        
             continue; 
        
           }

I think it has something to do with folding and lowering.

If we apply folding first, the ReLU becomes max(0, conv_2d) , so, my wild guess is that because we no longer see the ReLU layer the BN+conv_2d fusion is applied regardless of the max operation.

I believe there should be a check for operations on layer outputs before doing BN-conv_2d fusions.

Alternatively we can indicate to do not perform the lowering and folding optimizations / layer fusion together.

As for example, the code from #5868 is the best way to observe this behavior.

	auto *CV = dyn_cast<ConvolutionNode>(BN->getInput());
	if (!CV) {
	continue;
	}

Conv - Relu - BatchNorm WRONG FUSION