Compilation issue with Visual Studio v16.7.1

Question

Compilation issue with Visual Studio v16.7.1

albanD opened this issue 4 years ago · comments

Moving from Visual Studio v16.6.4 to v16.7.1, we now see the following error when compiling oneDNN:

[path]src\cpu\gemm_convolution_utils.cpp(401) : fatal error C1001: Internal compiler error.

The full log of a failed compilation can be found here.

Unfortunately, there isn't much information online about this error.
Any idea what could be causing this?

Longer log in case the link above cannot be accessed:

[500/3080] C:\Users\circleci\project\build\win_tmp\bin\sccache-cl.exe   /TP -DDNNL_ENABLE_CONCURRENT_EXEC -DDNNL_ENABLE_MAX_CPU_ISA -DDNNL_X64=1 -DIDEEP_USE_MKL -DNOMINMAX -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTH_BLAS_MKL -DWIN32_LEAN_AND_MEAN -D_CRT_SECURE_NO_DEPRECATE=1 -D_OPENMP_NOFORCE_MANIFEST -D_WIN -D__STDC_CONSTANT_MACROS -D__STDC_LIMIT_MACROS -I..\cmake\..\third_party\benchmark\include -Icaffe2\contrib\aten -I..\third_party\onnx -Ithird_party\onnx -I..\third_party\foxi -Ithird_party\foxi -I..\third_party\ideep\mkl-dnn\include -Ithird_party\ideep\mkl-dnn\include -I..\third_party\ideep\mkl-dnn\src -I..\cmake\..\third_party\googletest\googlemock\include -I..\cmake\..\third_party\googletest\googletest\include -I..\third_party\protobuf\src -Iwin_tmp\mkl\include -I..\third_party -I..\cmake\..\third_party\eigen -IC:\Jenkins\Miniconda3\include -IC:\Jenkins\Miniconda3\lib\site-packages\numpy\core\include -I..\cmake\..\third_party\pybind11\include /DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -openmp:experimental -DNDEBUG -openmp:experimental  /MP    /wd4800 /wd4068 /wd4305 /wd4551 /wd4244  /MD /O2 /Ob2 /DNDEBUG /w /bigobj -DNDEBUG -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -std:c++14 /showIncludes /Fothird_party\ideep\mkl-dnn\src\cpu\CMakeFiles\dnnl_cpu.dir\gemm_convolution_utils.cpp.obj /Fdthird_party\ideep\mkl-dnn\src\cpu\CMakeFiles\dnnl_cpu.dir\ /FS -c ..\third_party\ideep\mkl-dnn\src\cpu\gemm_convolution_utils.cpp
FAILED: third_party/ideep/mkl-dnn/src/cpu/CMakeFiles/dnnl_cpu.dir/gemm_convolution_utils.cpp.obj 
C:\Users\circleci\project\build\win_tmp\bin\sccache-cl.exe   /TP -DDNNL_ENABLE_CONCURRENT_EXEC -DDNNL_ENABLE_MAX_CPU_ISA -DDNNL_X64=1 -DIDEEP_USE_MKL -DNOMINMAX -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTH_BLAS_MKL -DWIN32_LEAN_AND_MEAN -D_CRT_SECURE_NO_DEPRECATE=1 -D_OPENMP_NOFORCE_MANIFEST -D_WIN -D__STDC_CONSTANT_MACROS -D__STDC_LIMIT_MACROS -I..\cmake\..\third_party\benchmark\include -Icaffe2\contrib\aten -I..\third_party\onnx -Ithird_party\onnx -I..\third_party\foxi -Ithird_party\foxi -I..\third_party\ideep\mkl-dnn\include -Ithird_party\ideep\mkl-dnn\include -I..\third_party\ideep\mkl-dnn\src -I..\cmake\..\third_party\googletest\googlemock\include -I..\cmake\..\third_party\googletest\googletest\include -I..\third_party\protobuf\src -Iwin_tmp\mkl\include -I..\third_party -I..\cmake\..\third_party\eigen -IC:\Jenkins\Miniconda3\include -IC:\Jenkins\Miniconda3\lib\site-packages\numpy\core\include -I..\cmake\..\third_party\pybind11\include /DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -openmp:experimental -DNDEBUG -openmp:experimental  /MP    /wd4800 /wd4068 /wd4305 /wd4551 /wd4244  /MD /O2 /Ob2 /DNDEBUG /w /bigobj -DNDEBUG -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -std:c++14 /showIncludes /Fothird_party\ideep\mkl-dnn\src\cpu\CMakeFiles\dnnl_cpu.dir\gemm_convolution_utils.cpp.obj /Fdthird_party\ideep\mkl-dnn\src\cpu\CMakeFiles\dnnl_cpu.dir\ /FS -c ..\third_party\ideep\mkl-dnn\src\cpu\gemm_convolution_utils.cpp
C:\Users\circleci\project\third_party\ideep\mkl-dnn\src\cpu\gemm_convolution_utils.cpp(401) : fatal error C1001: Internal compiler error.
(compiler file 'd:\agent\_work\7\s\src\vctools\Compiler\Utc\src\p2\main.c', line 195)
 To work around this problem, try simplifying or changing the program near the locations listed above.
If possible please provide a repro here: https://developercommunity.visualstudio.com 
Please choose the Technical Support command on the Visual C++ 
 Help menu, or open the Technical Support help file for more information
  cl!RaiseException()+0x69
  cl!RaiseException()+0x69
  cl!CloseTypeServerPDB()+0x22e6b
  cl!CloseTypeServerPDB()+0xcd30a

Microsoft (R) C/C++ Optimizing Compiler Version 19.27.29111 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

Evarist · Answer 1 · Tue Aug 18 2020 07:11:14 GMT+0800 (China Standard Time)

Cannot access the logs (it requires some authorization and permissions granting).
But the description seems very familiar: #805.
Could you please check the latest master or manually backport the fix?

Nikita Shulga · Answer 2 · Tue Aug 18 2020 12:31:49 GMT+0800 (China Standard Time)

Unfortunately, PyTorch uses oneDNN via https://github.com/intel/ideep, so to resolve the problem, it's necessary to update oneDNN version in ideep, see intel/ideep#44

albanD · Answer 3 · Tue Aug 18 2020 21:59:04 GMT+0800 (China Standard Time)

Hi,

Thanks for the reference (and the quick fix!)
It does look very similar with the full logs.

We downgraded VS until we can update the submodule with oneDNN.
cc @pinzhenx is there some documentation explaining how to do this upgrade?

Pinzhen Xu · Answer 4 · Tue Aug 18 2020 23:29:49 GMT+0800 (China Standard Time)

@albanD Upgrade onednn doesn't need much effort, manually switching branch inside ideep/mkl-dnn is the way to go.

When this fix as well as this fix are ported to the release branch, we can then promote the upgrade to the pytorch, if no other regression found in internal tests.

Niels Dekker · Answer 5 · Wed Aug 19 2020 23:06:40 GMT+0800 (China Standard Time)

Please consider voting or leaving a comment at the Visual C++ compiler bug report that I submitted, which appears to have caused this issue: Visual Studio problem 1145942 - VS2019 Internal compiler error using __restrict keyword in Release build

Niels Dekker · Answer 6 · Thu Aug 20 2020 00:53:43 GMT+0800 (China Standard Time)

FYI The compilation issue is still there with Visual Studio 2019 version 16.7.2, released on August 18, 2020.

Niels Dekker · Answer 7 · Sat Sep 05 2020 00:11:59 GMT+0800 (China Standard Time)

FYI Vadim Pirogov (@vpirogov) has just released oneDNN v1.6.2, including the workaround from pull request #805 for this issue. 😃