Fails to compile with GCC 12.1.0

Question

Fails to compile with GCC 12.1.0

otioss opened this issue 2 years ago · comments

🐛 Describe the bug

I followed the instructions to compile from source code within conda environment on arch linux. The compilation fails with the following error:

/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h: In function ‘void fbgemm::requantizeOutputProcessingGConvAvx512(uint8_t*, const int32_t*, const block_type_t&, int, int, const requantizationParams_t<BIAS_TYPE>&) [with bool A_SYMMETRIC = false; bool B_SYMMETRIC = false; QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL; bool HAS_BIAS = false; bool FUSE_RELU = false; int C_PER_G = 8; BIAS_TYPE = int]’:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h:188:10: note: ‘__Y’ was declared here
188 | __m512 __Y = __Y;
| ^~~
In function ‘__m512i _mm512_cvtps_epi32(__m512)’,
inlined from ‘void fbgemm::requantizeOutputProcessingGConvAvx512(uint8_t*, const int32_t*, const block_type_t&, int, int, const requantizationParams_t<BIAS_TYPE>&) [with bool A_SYMMETRIC = false; bool B_SYMMETRIC = false; QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL; bool HAS_BIAS = false; bool FUSE_RELU = false; int C_PER_G = 8; BIAS_TYPE = int]’ at /home/elf/brego/src/pytorch/third_party/fbgemm/src/QuantUtilsAvx512.cc:331:47:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h:14044:52: error: ‘__Y’ may be used uninitialized [-Werror=maybe-uninitialized]
14044 | return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
14045 | (__v16si)
| ~~~~~~~~~
14046 | _mm512_undefined_epi32 (),
| ~~~~~~~~~~~~~~~~~~~~~~~~~~
14047 | (__mmask16) -1,
| ~~~~~~~~~~~~~~~
14048 | _MM_FROUND_CUR_DIRECTION);
| ~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h: In function ‘void fbgemm::requantizeOutputProcessingGConvAvx512(uint8_t*, const int32_t*, const block_type_t&, int, int, const requantizationParams_t<BIAS_TYPE>&) [with bool A_SYMMETRIC = false; bool B_SYMMETRIC = false; QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL; bool HAS_BIAS = false; bool FUSE_RELU = false; int C_PER_G = 8; BIAS_TYPE = int]’:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h:206:11: note: ‘__Y’ was declared here
206 | __m512i __Y = __Y;
| ^~~
In function ‘__m512i _mm512_permutexvar_epi32(__m512i, __m512i)’,
inlined from ‘void fbgemm::requantizeOutputProcessingGConvAvx512(uint8_t*, const int32_t*, const block_type_t&, int, int, const requantizationParams_t<BIAS_TYPE>&) [with bool A_SYMMETRIC = false; bool B_SYMMETRIC = false; QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL; bool HAS_BIAS = false; bool FUSE_RELU = false; int C_PER_G = 8; BIAS_TYPE = int]’ at /home/elf/brego/src/pytorch/third_party/fbgemm/src/QuantUtilsAvx512.cc:353:45:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h:7027:53: error: ‘__Y’ may be used uninitialized [-Werror=maybe-uninitialized]
7027 | return (__m512i) __builtin_ia32_permvarsi512_mask ((__v16si) __Y,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
7028 | (__v16si) __X,
| ~~~~~~~~~~~~~~
7029 | (__v16si)
| ~~~~~~~~~
7030 | _mm512_undefined_epi32 (),
| ~~~~~~~~~~~~~~~~~~~~~~~~~~
7031 | (__mmask16) -1);
| ~~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h: In function ‘void fbgemm::requantizeOutputProcessingGConvAvx512(uint8_t*, const int32_t*, const block_type_t&, int, int, const requantizationParams_t<BIAS_TYPE>&) [with bool A_SYMMETRIC = false; bool B_SYMMETRIC = false; QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL; bool HAS_BIAS = false; bool FUSE_RELU = false; int C_PER_G = 8; BIAS_TYPE = int]’:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h:206:11: note: ‘__Y’ was declared here
206 | __m512i __Y = __Y;
| ^~~
In function ‘__m128i _mm512_extracti32x4_epi32(__m512i, int)’,
inlined from ‘__m128i _mm512_castsi512_si128(__m512i)’ at /usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h:15829:10,
inlined from ‘void fbgemm::requantizeOutputProcessingGConvAvx512(uint8_t*, const int32_t*, const block_type_t&, int, int, const requantizationParams_t<BIAS_TYPE>&) [with bool A_SYMMETRIC = false; bool B_SYMMETRIC = false; QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL; bool HAS_BIAS = false; bool FUSE_RELU = false; int C_PER_G = 8; BIAS_TYPE = int]’ at /home/elf/brego/src/pytorch/third_party/fbgemm/src/QuantUtilsAvx512.cc:373:25:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h:6045:53: error: ‘__Y’ may be used uninitialized [-Werror=maybe-uninitialized]
6045 | return (__m128i) __builtin_ia32_extracti32x4_mask ((__v16si) __A,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
6046 | __imm,
| ~~~~~~
6047 | (__v4si)
| ~~~~~~~~
6048 | _mm_undefined_si128 (),
| ~~~~~~~~~~~~~~~~~~~~~~~
6049 | (__mmask8) -1);
| ~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/emmintrin.h: In function ‘void fbgemm::requantizeOutputProcessingGConvAvx512(uint8_t*, const int32_t*, const block_type_t&, int, int, const requantizationParams_t<BIAS_TYPE>&) [with bool A_SYMMETRIC = false; bool B_SYMMETRIC = false; QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL; bool HAS_BIAS = false; bool FUSE_RELU = false; int C_PER_G = 8; BIAS_TYPE = int]’:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/emmintrin.h:788:11: note: ‘__Y’ was declared here
788 | __m128i __Y = __Y;
| ^~~
In function ‘__m512 _mm512_cvtepi32_ps(__m512i)’,
inlined from ‘void fbgemm::requantizeOutputProcessingGConvAvx512(uint8_t*, const int32_t*, const block_type_t&, int, int, const requantizationParams_t<BIAS_TYPE>&) [with bool A_SYMMETRIC = false; bool B_SYMMETRIC = false; QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL; bool HAS_BIAS = false; bool FUSE_RELU = false; int C_PER_G = 16; BIAS_TYPE = int]’ at /home/elf/brego/src/pytorch/third_party/fbgemm/src/QuantUtilsAvx512.cc:268:34:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h:14148:10: error: ‘__Y’ may be used uninitialized [-Werror=maybe-uninitialized]
14148 | return (__m512) __builtin_ia32_cvtdq2ps512_mask ((__v16si) __A,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
14149 | (__v16sf)
| ~~~~~~~~~
14150 | _mm512_undefined_ps (),
| ~~~~~~~~~~~~~~~~~~~~~~~
14151 | (__mmask16) -1,
| ~~~~~~~~~~~~~~~
14152 | _MM_FROUND_CUR_DIRECTION);
| ~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h: In function ‘void fbgemm::requantizeOutputProcessingGConvAvx512(uint8_t*, const int32_t*, const block_type_t&, int, int, const requantizationParams_t<BIAS_TYPE>&) [with bool A_SYMMETRIC = false; bool B_SYMMETRIC = false; QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL; bool HAS_BIAS = false; bool FUSE_RELU = false; int C_PER_G = 16; BIAS_TYPE = int]’:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h:188:10: note: ‘__Y’ was declared here
188 | __m512 __Y = __Y;
| ^~~
In function ‘__m512i _mm512_cvtps_epi32(__m512)’,
inlined from ‘void fbgemm::requantizeOutputProcessingGConvAvx512(uint8_t*, const int32_t*, const block_type_t&, int, int, const requantizationParams_t<BIAS_TYPE>&) [with bool A_SYMMETRIC = false; bool B_SYMMETRIC = false; QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL; bool HAS_BIAS = false; bool FUSE_RELU = false; int C_PER_G = 16; BIAS_TYPE = int]’ at /home/elf/brego/src/pytorch/third_party/fbgemm/src/QuantUtilsAvx512.cc:331:47:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h:14044:52: error: ‘__Y’ may be used uninitialized [-Werror=maybe-uninitialized]
14044 | return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
14045 | (__v16si)
| ~~~~~~~~~
14046 | _mm512_undefined_epi32 (),
| ~~~~~~~~~~~~~~~~~~~~~~~~~~
14047 | (__mmask16) -1,
| ~~~~~~~~~~~~~~~
14048 | _MM_FROUND_CUR_DIRECTION);
| ~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h: In function ‘void fbgemm::requantizeOutputProcessingGConvAvx512(uint8_t*, const int32_t*, const block_type_t&, int, int, const requantizationParams_t<BIAS_TYPE>&) [with bool A_SYMMETRIC = false; bool B_SYMMETRIC = false; QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL; bool HAS_BIAS = false; bool FUSE_RELU = false; int C_PER_G = 16; BIAS_TYPE = int]’:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h:206:11: note: ‘__Y’ was declared here
206 | __m512i __Y = __Y;
| ^~~
In function ‘__m512i _mm512_permutexvar_epi32(__m512i, __m512i)’,
inlined from ‘void fbgemm::requantizeOutputProcessingGConvAvx512(uint8_t*, const int32_t*, const block_type_t&, int, int, const requantizationParams_t<BIAS_TYPE>&) [with bool A_SYMMETRIC = false; bool B_SYMMETRIC = false; QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL; bool HAS_BIAS = false; bool FUSE_RELU = false; int C_PER_G = 16; BIAS_TYPE = int]’ at /home/elf/brego/src/pytorch/third_party/fbgemm/src/QuantUtilsAvx512.cc:353:45:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h:7027:53: error: ‘__Y’ may be used uninitialized [-Werror=maybe-uninitialized]
7027 | return (__m512i) __builtin_ia32_permvarsi512_mask ((__v16si) __Y,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
7028 | (__v16si) __X,
| ~~~~~~~~~~~~~~
7029 | (__v16si)
| ~~~~~~~~~
7030 | _mm512_undefined_epi32 (),
| ~~~~~~~~~~~~~~~~~~~~~~~~~~
7031 | (__mmask16) -1);
| ~~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h: In function ‘void fbgemm::requantizeOutputProcessingGConvAvx512(uint8_t*, const int32_t*, const block_type_t&, int, int, const requantizationParams_t<BIAS_TYPE>&) [with bool A_SYMMETRIC = false; bool B_SYMMETRIC = false; QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL; bool HAS_BIAS = false; bool FUSE_RELU = false; int C_PER_G = 16; BIAS_TYPE = int]’:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h:206:11: note: ‘__Y’ was declared here
206 | __m512i __Y = __Y;
| ^~~
In function ‘__m128i _mm512_extracti32x4_epi32(__m512i, int)’,
inlined from ‘__m128i _mm512_castsi512_si128(__m512i)’ at /usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h:15829:10,
inlined from ‘void fbgemm::requantizeOutputProcessingGConvAvx512(uint8_t*, const int32_t*, const block_type_t&, int, int, const requantizationParams_t<BIAS_TYPE>&) [with bool A_SYMMETRIC = false; bool B_SYMMETRIC = false; QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL; bool HAS_BIAS = false; bool FUSE_RELU = false; int C_PER_G = 16; BIAS_TYPE = int]’ at /home/elf/brego/src/pytorch/third_party/fbgemm/src/QuantUtilsAvx512.cc:369:25:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h:6045:53: error: ‘__Y’ may be used uninitialized [-Werror=maybe-uninitialized]
6045 | return (__m128i) __builtin_ia32_extracti32x4_mask ((__v16si) __A,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
6046 | __imm,
| ~~~~~~
6047 | (__v4si)
| ~~~~~~~~
6048 | _mm_undefined_si128 (),
| ~~~~~~~~~~~~~~~~~~~~~~~
6049 | (__mmask8) -1);
| ~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/emmintrin.h: In function ‘void fbgemm::requantizeOutputProcessingGConvAvx512(uint8_t*, const int32_t*, const block_type_t&, int, int, const requantizationParams_t<BIAS_TYPE>&) [with bool A_SYMMETRIC = false; bool B_SYMMETRIC = false; QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL; bool HAS_BIAS = false; bool FUSE_RELU = false; int C_PER_G = 16; BIAS_TYPE = int]’:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/emmintrin.h:788:11: note: ‘__Y’ was declared here
788 | __m128i __Y = __Y;
| ^~~
cc1plus: all warnings being treated as errors
ninja: build stopped: subcommand failed.

Versions

Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A

OS: Arch Linux (x86_64)
GCC version: (GCC) 12.1.0
Clang version: 13.0.1
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.9.12 (main, Apr 5 2022, 06:56:58) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.17.7-arch1-2-x86_64-with-glibc2.35
Is CUDA available: N/A
CUDA runtime version: 11.7.64
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3080 Ti
Nvidia driver version: 515.43.04
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: N/A

Versions of relevant libraries:
[pip3] numpy==1.22.3
[conda] cudatoolkit 11.3.1 h2bc3f7f_2 anaconda
[conda] magma-cuda110 2.5.2 1 pytorch
[conda] mkl 2022.0.1 h06a4308_117
[conda] mkl-include 2022.0.1 h06a4308_117
[conda] numpy 1.22.3 py39h7a5d4dd_0
[conda] numpy-base 1.22.3 py39hb8be1f0_0

cc @malfet @seemethere

otioss · Answer 1 · Sat May 21 2022 12:21:46 GMT+0800 (China Standard Time)

Related to pytorch/FBGEMM#1094. Looks like a GCC 12 regression which particularly hits AMD CPUs.

George Qi · Answer 2 · Tue May 24 2022 00:48:22 GMT+0800 (China Standard Time)

Hi otioss, thanks for the report! If you have an idea of how to fix this, we would accept a patch that fixes this

Ling Li · Answer 3 · Tue Jul 05 2022 19:54:59 GMT+0800 (China Standard Time)

So I had similar problems with building pytorch on AMD cpu/arch linux 5.17/gcc12.x
My workaround was to installl gcc 11.3 (sudo pacman -Sy gcc-11) then

export CC=gcc-11
export CXX=g++-11

then you can run python setup.py install in the usual way to build pytorch
to install to conda I did:

python setup.py bdist_wheel
pip install .

mal · Answer 4 · Sat Aug 27 2022 23:24:01 GMT+0800 (China Standard Time)

I used the community/gcc11 package instead of aur/gcc-11. Also needed to do a new checkout to clear all the generated cmake/etc files.

Felix Crazzolara · Answer 5 · Sun Sep 18 2022 18:29:18 GMT+0800 (China Standard Time)

This also affects GCC 13.0.0.

Daniel Tang · Answer 6 · Sun Oct 09 2022 12:40:43 GMT+0800 (China Standard Time)

AMD CPUs.

This also affects "Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz" with "gcc (Ubuntu 12.2.0-3ubuntu1) 12.2.0"

Alex Lindsay · Answer 7 · Thu Dec 22 2022 01:34:52 GMT+0800 (China Standard Time)

Has anyone submitted a bug report to gcc or anyone know of the status of potential fixes for gcc?

Emmanuel Thomé · Answer 8 · Tue Jan 17 2023 04:59:48 GMT+0800 (China Standard Time)

Apparently a fix just made it to gcc-trunk today https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593

Birch-san · Answer 9 · Fri Apr 28 2023 07:27:01 GMT+0800 (China Standard Time)

from my brief reading of the bug, it sounded like the warning is a false-positive, so if we can tell the compiler to continue we should be fine?

so, I passed compiler flags -Wno-maybe-uninitialized -Wno-uninitialized:

CXXFLAGS='-Wno-maybe-uninitialized -Wno-uninitialized' CFLAGS='-Wno-maybe-uninitialized -Wno-uninitialized' USE_ROCM=0 TORCH_CUDA_ARCH_LIST=8.9 PATH="$CUDA_DIR/bin:$PATH" LD_LIBRARY_PATH=$CUDA_DIR/lib64 python setup.py develop

this got further, but failed on error: ‘void operator delete(void*)’ called on pointer ‘<unknown>’ with nonzero offset:

FAILED: third_party/ideep/mkl-dnn/src/backend/dnnl/CMakeFiles/dnnl_graph_backend_dnnl.dir/dnnl_backend.cpp.o 
/usr/bin/c++ -DDNNL_GRAPH_CPU_RUNTIME=2 -DIDEEP_USE_MKL -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -D__STDC_CONSTANT_MACROS -D__STDC_LIMIT_MACROS -I/home/birch/git/pytorch/cmake/../third_party/benchmark/include -I/home/birch/git/pytorch/third_party/onnx -I/home/birch/git/pytorch/build/third_party/onnx -I/home/birch/git/pytorch/third_party/foxi -I/home/birch/git/pytorch/build/third_party/foxi -I/home/birch/git/pytorch/third_party/ideep/mkl-dnn/include -I/home/birch/git/pytorch/third_party/ideep/mkl-dnn/src -I/home/birch/git/pytorch/third_party/ideep/mkl-dnn/third_party/oneDNN/include -I/home/birch/git/pytorch/build/third_party/ideep/mkl-dnn/third_party/oneDNN/include -isystem /home/birch/git/pytorch/build/third_party/gloo -isystem /home/birch/git/pytorch/cmake/../third_party/gloo -isystem /home/birch/git/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /home/birch/git/pytorch/cmake/../third_party/googletest/googletest/include -isystem /home/birch/git/pytorch/third_party/protobuf/src -isystem /home/birch/anaconda3/envs/p310-cu121/include -isystem /home/birch/git/pytorch/third_party/gemmlowp -isystem /home/birch/git/pytorch/third_party/neon2sse -isystem /home/birch/git/pytorch/third_party/XNNPACK/include -isystem /home/birch/git/pytorch/third_party/ittapi/include -isystem /home/birch/git/pytorch/cmake/../third_party/eigen -isystem /usr/local/cuda-12.1/include -Wno-maybe-uninitialized -Wno-uninitialized -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -std=c++11 -fopenmp -fvisibility-inlines-hidden  -Wall -Werror -Wno-unknown-pragmas -fvisibility=internal -fPIC -Wformat -Wformat-security -fstack-protector-strong   -Wmissing-field-initializers  -Wno-strict-overflow -O3 -DNDEBUG -DNDEBUG -D_FORTIFY_SOURCE=2 -fPIC -fvisibility=hidden -fvisibility-inlines-hidden -DCAFFE2_USE_GLOO -MD -MT third_party/ideep/mkl-dnn/src/backend/dnnl/CMakeFiles/dnnl_graph_backend_dnnl.dir/dnnl_backend.cpp.o -MF third_party/ideep/mkl-dnn/src/backend/dnnl/CMakeFiles/dnnl_graph_backend_dnnl.dir/dnnl_backend.cpp.o.d -o third_party/ideep/mkl-dnn/src/backend/dnnl/CMakeFiles/dnnl_graph_backend_dnnl.dir/dnnl_backend.cpp.o -c /home/birch/git/pytorch/third_party/ideep/mkl-dnn/src/backend/dnnl/dnnl_backend.cpp
In file included from /usr/include/x86_64-linux-gnu/c++/12/bits/c++allocator.h:33,
                 from /usr/include/c++/12/bits/allocator.h:46,
                 from /usr/include/c++/12/memory:64,
                 from /home/birch/git/pytorch/third_party/ideep/mkl-dnn/src/utils/compatible.hpp:23,
                 from /home/birch/git/pytorch/third_party/ideep/mkl-dnn/src/backend/dnnl/dnnl_backend.cpp:19:
In member function ‘void std::__new_allocator<_Tp>::deallocate(_Tp*, size_type) [with _Tp = long int]’,
    inlined from ‘static void std::allocator_traits<std::allocator<_Tp1> >::deallocate(allocator_type&, pointer, size_type) [with _Tp = long int]’ at /usr/include/c++/12/bits/alloc_traits.h:496:23,
    inlined from ‘void std::_Vector_base<_Tp, _Alloc>::_M_deallocate(pointer, std::size_t) [with _Tp = long int; _Alloc = std::allocator<long int>]’ at /usr/include/c++/12/bits/stl_vector.h:387:19,
    inlined from ‘std::_Vector_base<_Tp, _Alloc>::~_Vector_base() [with _Tp = long int; _Alloc = std::allocator<long int>]’ at /usr/include/c++/12/bits/stl_vector.h:366:15,
    inlined from ‘std::vector<_Tp, _Alloc>::~vector() [with _Tp = long int; _Alloc = std::allocator<long int>]’ at /usr/include/c++/12/bits/stl_vector.h:733:7,
    inlined from ‘virtual void dnnl::graph::impl::dnnl_impl::bn_folding_t::execute(const dnnl::stream&, const std::unordered_map<int, dnnl::memory>&) const’ at /home/birch/git/pytorch/third_party/ideep/mkl-dnn/src/backend/dnnl/op_executable.hpp:1060:63:
/usr/include/c++/12/bits/new_allocator.h:158:33: error: ‘void operator delete(void*)’ called on pointer ‘<unknown>’ with nonzero offset [1, 9223372036854775800] [-Werror=free-nonheap-object]
  158 |         _GLIBCXX_OPERATOR_DELETE(_GLIBCXX_SIZED_DEALLOC(__p, __n));
      |                                 ^
In member function ‘_Tp* std::__new_allocator<_Tp>::allocate(size_type, const void*) [with _Tp = long int]’,
    inlined from ‘static _Tp* std::allocator_traits<std::allocator<_Tp1> >::allocate(allocator_type&, size_type) [with _Tp = long int]’ at /usr/include/c++/12/bits/alloc_traits.h:464:28,
    inlined from ‘std::_Vector_base<_Tp, _Alloc>::pointer std::_Vector_base<_Tp, _Alloc>::_M_allocate(std::size_t) [with _Tp = long int; _Alloc = std::allocator<long int>]’ at /usr/include/c++/12/bits/stl_vector.h:378:33,
    inlined from ‘void std::vector<_Tp, _Alloc>::_M_range_initialize(_ForwardIterator, _ForwardIterator, std::forward_iterator_tag) [with _ForwardIterator = const long int*; _Tp = long int; _Alloc = std::allocator<long int>]’ at /usr/include/c++/12/bits/stl_vector.h:1687:25,
    inlined from ‘std::vector<_Tp, _Alloc>::vector(_InputIterator, _InputIterator, const allocator_type&) [with _InputIterator = const long int*; <template-parameter-2-2> = void; _Tp = long int; _Alloc = std::allocator<long int>]’ at /usr/include/c++/12/bits/stl_vector.h:706:23,
    inlined from ‘dnnl::memory::dims dnnl::memory::desc::dims() const’ at /home/birch/git/pytorch/third_party/ideep/mkl-dnn/third_party/oneDNN/include/oneapi/dnnl/dnnl.hpp:2677:66,
    inlined from ‘virtual void dnnl::graph::impl::dnnl_impl::bn_folding_t::execute(const dnnl::stream&, const std::unordered_map<int, dnnl::memory>&) const’ at /home/birch/git/pytorch/third_party/ideep/mkl-dnn/src/backend/dnnl/op_executable.hpp:1060:63:
/usr/include/c++/12/bits/new_allocator.h:137:55: note: returned from ‘void* operator new(std::size_t)’
  137 |         return static_cast<_Tp*>(_GLIBCXX_OPERATOR_NEW(__n * sizeof(_Tp)));
      |                                                       ^
cc1plus: all warnings being treated as errors

well, maybe I'm wrong that it's an ignorable warning. 🙃

Birch-san · Answer 10 · Fri Apr 28 2023 09:14:22 GMT+0800 (China Standard Time)

gets a bit further if I ignore free-nonheap-object warnings:

CXXFLAGS='-Wno-maybe-uninitialized -Wno-uninitialized -Wno-free-nonheap-object' CFLAGS='-Wno-maybe-uninitialized -Wno-uninitialized -Wno-free-nonheap-object'

next failure is:

[6128/7025] Building CXX object test_api/CMakeFiles/test_api.dir/dataloader.cpp.o
FAILED: test_api/CMakeFiles/test_api.dir/dataloader.cpp.o 
/usr/bin/c++ -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_CUDA -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_RPC -DUSE_TENSORPIPE -D_FILE_OFFSET_BITS=64 -I/home/birch/git/pytorch/build/aten/src -I/home/birch/git/pytorch/aten/src -I/home/birch/git/pytorch/build -I/home/birch/git/pytorch -I/home/birch/git/pytorch/cmake/../third_party/benchmark/include -I/home/birch/git/pytorch/third_party/onnx -I/home/birch/git/pytorch/build/third_party/onnx -I/home/birch/git/pytorch/third_party/foxi -I/home/birch/git/pytorch/build/third_party/foxi -I/home/birch/git/pytorch/build/caffe2/../aten/src -I/home/birch/git/pytorch/torch/csrc/api -I/home/birch/git/pytorch/torch/csrc/api/include -I/home/birch/git/pytorch/c10/.. -I/home/birch/git/pytorch/c10/cuda/../.. -isystem /home/birch/git/pytorch/build/third_party/gloo -isystem /home/birch/git/pytorch/cmake/../third_party/gloo -isystem /home/birch/git/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /home/birch/git/pytorch/cmake/../third_party/googletest/googletest/include -isystem /home/birch/git/pytorch/third_party/protobuf/src -isystem /home/birch/anaconda3/envs/p310-cu121/include -isystem /home/birch/git/pytorch/third_party/gemmlowp -isystem /home/birch/git/pytorch/third_party/neon2sse -isystem /home/birch/git/pytorch/third_party/XNNPACK/include -isystem /home/birch/git/pytorch/third_party/ittapi/include -isystem /home/birch/git/pytorch/cmake/../third_party/eigen -isystem /usr/local/cuda-12.1/include -isystem /home/birch/git/pytorch/third_party/ideep/mkl-dnn/third_party/oneDNN/include -isystem /home/birch/git/pytorch/third_party/ideep/include -isystem /home/birch/git/pytorch/third_party/ideep/mkl-dnn/include -isystem /home/birch/git/pytorch/third_party/googletest/googletest/include -isystem /home/birch/git/pytorch/third_party/googletest/googletest -Wno-maybe-uninitialized -Wno-uninitialized -Wno-free-nonheap-object -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow -DHAVE_AVX512_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -fPIE -DCAFFE2_USE_GLOO -DTH_HAVE_THREAD -Wno-unused-variable -Wno-missing-braces -Wno-maybe-uninitialized -Wno-unused-but-set-parameter -MD -MT test_api/CMakeFiles/test_api.dir/dataloader.cpp.o -MF test_api/CMakeFiles/test_api.dir/dataloader.cpp.o.d -o test_api/CMakeFiles/test_api.dir/dataloader.cpp.o -c /home/birch/git/pytorch/test/cpp/api/dataloader.cpp
In file included from /usr/include/c++/12/memory:63,
                 from /home/birch/git/pytorch/third_party/googletest/googletest/include/gtest/gtest.h:57,
                 from /home/birch/git/pytorch/test/cpp/api/dataloader.cpp:1:
In static member function ‘static _Tp* std::__copy_move<_IsMove, true, std::random_access_iterator_tag>::__copy_m(const _Tp*, const _Tp*, _Tp*) [with _Tp = long unsigned int; bool _IsMove = false]’,
    inlined from ‘_OI std::__copy_move_a2(_II, _II, _OI) [with bool _IsMove = false; _II = const long unsigned int*; _OI = long unsigned int*]’ at /usr/include/c++/12/bits/stl_algobase.h:495:30,
    inlined from ‘_OI std::__copy_move_a1(_II, _II, _OI) [with bool _IsMove = false; _II = const long unsigned int*; _OI = long unsigned int*]’ at /usr/include/c++/12/bits/stl_algobase.h:522:42,
    inlined from ‘_OI std::__copy_move_a(_II, _II, _OI) [with bool _IsMove = false; _II = __gnu_cxx::__normal_iterator<const long unsigned int*, vector<long unsigned int> >; _OI = __gnu_cxx::__normal_iterator<long unsigned int*, vector<long unsigned int> >]’ at /usr/include/c++/12/bits/stl_algobase.h:529:31,
    inlined from ‘_OI std::copy(_II, _II, _OI) [with _II = __gnu_cxx::__normal_iterator<const long unsigned int*, vector<long unsigned int> >; _OI = __gnu_cxx::__normal_iterator<long unsigned int*, vector<long unsigned int> >]’ at /usr/include/c++/12/bits/stl_algobase.h:620:7,
    inlined from ‘std::vector<_Tp, _Alloc>& std::vector<_Tp, _Alloc>::operator=(const std::vector<_Tp, _Alloc>&) [with _Tp = long unsigned int; _Alloc = std::allocator<long unsigned int>]’ at /usr/include/c++/12/bits/vector.tcc:244:21:
/usr/include/c++/12/bits/stl_algobase.h:431:30: error: argument 1 null where non-null expected [-Werror=nonnull]
  431 |             __builtin_memmove(__result, __first, sizeof(_Tp) * _Num);
      |             ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/c++/12/bits/stl_algobase.h:431:30: note: in a call to built-in function ‘void* __builtin_memmove(void*, const void*, long unsigned int)’
At global scope:
cc1plus: note: unrecognized command-line option ‘-Wno-aligned-allocation-unavailable’ may have been intended to silence earlier diagnostics
cc1plus: note: unrecognized command-line option ‘-Wno-unused-private-field’ may have been intended to silence earlier diagnostics
cc1plus: note: unrecognized command-line option ‘-Wno-invalid-partial-specialization’ may have been intended to silence earlier diagnostics
cc1plus: some warnings being treated as errors

Birch-san · Answer 11 · Sat Apr 29 2023 06:46:07 GMT+0800 (China Standard Time)

on the basis that that file is part of a test_api: I figured I could live with ignoring the warnings.

told pytorch to not to treat nonnull warnings as errors.

successfully compiled PyTorch 2.1.0 commit b8d7a28 with CUDA 12.1.1, gcc 12.2.0, Ubuntu 22.10 on Ryzen 7700X.

the final command I used was:

CUDA_DIR=/usr/local/cuda-12.1
CXXFLAGS='-Wno-maybe-uninitialized -Wno-uninitialized -Wno-free-nonheap-object -Wno-nonnull' CFLAGS='-Wno-maybe-uninitialized -Wno-uninitialized -Wno-free-nonheap-object -Wno-nonnull' USE_ROCM=0 TORCH_CUDA_ARCH_LIST=8.9 PATH="$CUDA_DIR/bin:$PATH" LD_LIBRARY_PATH=$CUDA_DIR/lib64 python setup.py develop

full instructions of how I ran, here:
https://gist.github.com/Birch-san/211f31f8d901dadd1025398fa1a603b8

starball · Answer 12 · Tue May 09 2023 02:35:09 GMT+0800 (China Standard Time)

Related on Stack Overflow: How to build pytorch/xla (from source) on Windows 11 WSL