pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Home Page:https://pytorch.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fails to compile with GCC 12.1.0

otioss opened this issue · comments

🐛 Describe the bug

I followed the instructions to compile from source code within conda environment on arch linux. The compilation fails with the following error:

/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h: In function ‘void fbgemm::requantizeOutputProcessingGConvAvx512(uint8_t*, const int32_t*, const block_type_t&, int, int, const requantizationParams_t<BIAS_TYPE>&) [with bool A_SYMMETRIC = false; bool B_SYMMETRIC = false; QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL; bool HAS_BIAS = false; bool FUSE_RELU = false; int C_PER_G = 8; BIAS_TYPE = int]’:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h:188:10: note: ‘__Y’ was declared here
188 | __m512 __Y = __Y;
| ^~~
In function ‘__m512i _mm512_cvtps_epi32(__m512)’,
inlined from ‘void fbgemm::requantizeOutputProcessingGConvAvx512(uint8_t*, const int32_t*, const block_type_t&, int, int, const requantizationParams_t<BIAS_TYPE>&) [with bool A_SYMMETRIC = false; bool B_SYMMETRIC = false; QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL; bool HAS_BIAS = false; bool FUSE_RELU = false; int C_PER_G = 8; BIAS_TYPE = int]’ at /home/elf/brego/src/pytorch/third_party/fbgemm/src/QuantUtilsAvx512.cc:331:47:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h:14044:52: error: ‘__Y’ may be used uninitialized [-Werror=maybe-uninitialized]
14044 | return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
14045 | (__v16si)
| ~~~~~~~~~
14046 | _mm512_undefined_epi32 (),
| ~~~~~~~~~~~~~~~~~~~~~~~~~~
14047 | (__mmask16) -1,
| ~~~~~~~~~~~~~~~
14048 | _MM_FROUND_CUR_DIRECTION);
| ~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h: In function ‘void fbgemm::requantizeOutputProcessingGConvAvx512(uint8_t*, const int32_t*, const block_type_t&, int, int, const requantizationParams_t<BIAS_TYPE>&) [with bool A_SYMMETRIC = false; bool B_SYMMETRIC = false; QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL; bool HAS_BIAS = false; bool FUSE_RELU = false; int C_PER_G = 8; BIAS_TYPE = int]’:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h:206:11: note: ‘__Y’ was declared here
206 | __m512i __Y = __Y;
| ^~~
In function ‘__m512i _mm512_permutexvar_epi32(__m512i, __m512i)’,
inlined from ‘void fbgemm::requantizeOutputProcessingGConvAvx512(uint8_t*, const int32_t*, const block_type_t&, int, int, const requantizationParams_t<BIAS_TYPE>&) [with bool A_SYMMETRIC = false; bool B_SYMMETRIC = false; QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL; bool HAS_BIAS = false; bool FUSE_RELU = false; int C_PER_G = 8; BIAS_TYPE = int]’ at /home/elf/brego/src/pytorch/third_party/fbgemm/src/QuantUtilsAvx512.cc:353:45:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h:7027:53: error: ‘__Y’ may be used uninitialized [-Werror=maybe-uninitialized]
7027 | return (__m512i) __builtin_ia32_permvarsi512_mask ((__v16si) __Y,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
7028 | (__v16si) __X,
| ~~~~~~~~~~~~~~
7029 | (__v16si)
| ~~~~~~~~~
7030 | _mm512_undefined_epi32 (),
| ~~~~~~~~~~~~~~~~~~~~~~~~~~
7031 | (__mmask16) -1);
| ~~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h: In function ‘void fbgemm::requantizeOutputProcessingGConvAvx512(uint8_t*, const int32_t*, const block_type_t&, int, int, const requantizationParams_t<BIAS_TYPE>&) [with bool A_SYMMETRIC = false; bool B_SYMMETRIC = false; QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL; bool HAS_BIAS = false; bool FUSE_RELU = false; int C_PER_G = 8; BIAS_TYPE = int]’:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h:206:11: note: ‘__Y’ was declared here
206 | __m512i __Y = __Y;
| ^~~
In function ‘__m128i _mm512_extracti32x4_epi32(__m512i, int)’,
inlined from ‘__m128i _mm512_castsi512_si128(__m512i)’ at /usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h:15829:10,
inlined from ‘void fbgemm::requantizeOutputProcessingGConvAvx512(uint8_t*, const int32_t*, const block_type_t&, int, int, const requantizationParams_t<BIAS_TYPE>&) [with bool A_SYMMETRIC = false; bool B_SYMMETRIC = false; QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL; bool HAS_BIAS = false; bool FUSE_RELU = false; int C_PER_G = 8; BIAS_TYPE = int]’ at /home/elf/brego/src/pytorch/third_party/fbgemm/src/QuantUtilsAvx512.cc:373:25:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h:6045:53: error: ‘__Y’ may be used uninitialized [-Werror=maybe-uninitialized]
6045 | return (__m128i) __builtin_ia32_extracti32x4_mask ((__v16si) __A,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
6046 | __imm,
| ~~~~~~
6047 | (__v4si)
| ~~~~~~~~
6048 | _mm_undefined_si128 (),
| ~~~~~~~~~~~~~~~~~~~~~~~
6049 | (__mmask8) -1);
| ~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/emmintrin.h: In function ‘void fbgemm::requantizeOutputProcessingGConvAvx512(uint8_t*, const int32_t*, const block_type_t&, int, int, const requantizationParams_t<BIAS_TYPE>&) [with bool A_SYMMETRIC = false; bool B_SYMMETRIC = false; QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL; bool HAS_BIAS = false; bool FUSE_RELU = false; int C_PER_G = 8; BIAS_TYPE = int]’:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/emmintrin.h:788:11: note: ‘__Y’ was declared here
788 | __m128i __Y = __Y;
| ^~~
In function ‘__m512 _mm512_cvtepi32_ps(__m512i)’,
inlined from ‘void fbgemm::requantizeOutputProcessingGConvAvx512(uint8_t*, const int32_t*, const block_type_t&, int, int, const requantizationParams_t<BIAS_TYPE>&) [with bool A_SYMMETRIC = false; bool B_SYMMETRIC = false; QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL; bool HAS_BIAS = false; bool FUSE_RELU = false; int C_PER_G = 16; BIAS_TYPE = int]’ at /home/elf/brego/src/pytorch/third_party/fbgemm/src/QuantUtilsAvx512.cc:268:34:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h:14148:10: error: ‘__Y’ may be used uninitialized [-Werror=maybe-uninitialized]
14148 | return (__m512) __builtin_ia32_cvtdq2ps512_mask ((__v16si) __A,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
14149 | (__v16sf)
| ~~~~~~~~~
14150 | _mm512_undefined_ps (),
| ~~~~~~~~~~~~~~~~~~~~~~~
14151 | (__mmask16) -1,
| ~~~~~~~~~~~~~~~
14152 | _MM_FROUND_CUR_DIRECTION);
| ~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h: In function ‘void fbgemm::requantizeOutputProcessingGConvAvx512(uint8_t*, const int32_t*, const block_type_t&, int, int, const requantizationParams_t<BIAS_TYPE>&) [with bool A_SYMMETRIC = false; bool B_SYMMETRIC = false; QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL; bool HAS_BIAS = false; bool FUSE_RELU = false; int C_PER_G = 16; BIAS_TYPE = int]’:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h:188:10: note: ‘__Y’ was declared here
188 | __m512 __Y = __Y;
| ^~~
In function ‘__m512i _mm512_cvtps_epi32(__m512)’,
inlined from ‘void fbgemm::requantizeOutputProcessingGConvAvx512(uint8_t*, const int32_t*, const block_type_t&, int, int, const requantizationParams_t<BIAS_TYPE>&) [with bool A_SYMMETRIC = false; bool B_SYMMETRIC = false; QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL; bool HAS_BIAS = false; bool FUSE_RELU = false; int C_PER_G = 16; BIAS_TYPE = int]’ at /home/elf/brego/src/pytorch/third_party/fbgemm/src/QuantUtilsAvx512.cc:331:47:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h:14044:52: error: ‘__Y’ may be used uninitialized [-Werror=maybe-uninitialized]
14044 | return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
14045 | (__v16si)
| ~~~~~~~~~
14046 | _mm512_undefined_epi32 (),
| ~~~~~~~~~~~~~~~~~~~~~~~~~~
14047 | (__mmask16) -1,
| ~~~~~~~~~~~~~~~
14048 | _MM_FROUND_CUR_DIRECTION);
| ~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h: In function ‘void fbgemm::requantizeOutputProcessingGConvAvx512(uint8_t*, const int32_t*, const block_type_t&, int, int, const requantizationParams_t<BIAS_TYPE>&) [with bool A_SYMMETRIC = false; bool B_SYMMETRIC = false; QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL; bool HAS_BIAS = false; bool FUSE_RELU = false; int C_PER_G = 16; BIAS_TYPE = int]’:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h:206:11: note: ‘__Y’ was declared here
206 | __m512i __Y = __Y;
| ^~~
In function ‘__m512i _mm512_permutexvar_epi32(__m512i, __m512i)’,
inlined from ‘void fbgemm::requantizeOutputProcessingGConvAvx512(uint8_t*, const int32_t*, const block_type_t&, int, int, const requantizationParams_t<BIAS_TYPE>&) [with bool A_SYMMETRIC = false; bool B_SYMMETRIC = false; QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL; bool HAS_BIAS = false; bool FUSE_RELU = false; int C_PER_G = 16; BIAS_TYPE = int]’ at /home/elf/brego/src/pytorch/third_party/fbgemm/src/QuantUtilsAvx512.cc:353:45:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h:7027:53: error: ‘__Y’ may be used uninitialized [-Werror=maybe-uninitialized]
7027 | return (__m512i) __builtin_ia32_permvarsi512_mask ((__v16si) __Y,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
7028 | (__v16si) __X,
| ~~~~~~~~~~~~~~
7029 | (__v16si)
| ~~~~~~~~~
7030 | _mm512_undefined_epi32 (),
| ~~~~~~~~~~~~~~~~~~~~~~~~~~
7031 | (__mmask16) -1);
| ~~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h: In function ‘void fbgemm::requantizeOutputProcessingGConvAvx512(uint8_t*, const int32_t*, const block_type_t&, int, int, const requantizationParams_t<BIAS_TYPE>&) [with bool A_SYMMETRIC = false; bool B_SYMMETRIC = false; QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL; bool HAS_BIAS = false; bool FUSE_RELU = false; int C_PER_G = 16; BIAS_TYPE = int]’:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h:206:11: note: ‘__Y’ was declared here
206 | __m512i __Y = __Y;
| ^~~
In function ‘__m128i _mm512_extracti32x4_epi32(__m512i, int)’,
inlined from ‘__m128i _mm512_castsi512_si128(__m512i)’ at /usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h:15829:10,
inlined from ‘void fbgemm::requantizeOutputProcessingGConvAvx512(uint8_t*, const int32_t*, const block_type_t&, int, int, const requantizationParams_t<BIAS_TYPE>&) [with bool A_SYMMETRIC = false; bool B_SYMMETRIC = false; QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL; bool HAS_BIAS = false; bool FUSE_RELU = false; int C_PER_G = 16; BIAS_TYPE = int]’ at /home/elf/brego/src/pytorch/third_party/fbgemm/src/QuantUtilsAvx512.cc:369:25:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/avx512fintrin.h:6045:53: error: ‘__Y’ may be used uninitialized [-Werror=maybe-uninitialized]
6045 | return (__m128i) __builtin_ia32_extracti32x4_mask ((__v16si) __A,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
6046 | __imm,
| ~~~~~~
6047 | (__v4si)
| ~~~~~~~~
6048 | _mm_undefined_si128 (),
| ~~~~~~~~~~~~~~~~~~~~~~~
6049 | (__mmask8) -1);
| ~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/emmintrin.h: In function ‘void fbgemm::requantizeOutputProcessingGConvAvx512(uint8_t*, const int32_t*, const block_type_t&, int, int, const requantizationParams_t<BIAS_TYPE>&) [with bool A_SYMMETRIC = false; bool B_SYMMETRIC = false; QuantizationGranularity Q_GRAN = fbgemm::QuantizationGranularity::OUT_CHANNEL; bool HAS_BIAS = false; bool FUSE_RELU = false; int C_PER_G = 16; BIAS_TYPE = int]’:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.1.0/include/emmintrin.h:788:11: note: ‘__Y’ was declared here
788 | __m128i __Y = __Y;
| ^~~
cc1plus: all warnings being treated as errors
ninja: build stopped: subcommand failed.

Versions

Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A

OS: Arch Linux (x86_64)
GCC version: (GCC) 12.1.0
Clang version: 13.0.1
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.9.12 (main, Apr 5 2022, 06:56:58) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.17.7-arch1-2-x86_64-with-glibc2.35
Is CUDA available: N/A
CUDA runtime version: 11.7.64
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3080 Ti
Nvidia driver version: 515.43.04
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: N/A

Versions of relevant libraries:
[pip3] numpy==1.22.3
[conda] cudatoolkit 11.3.1 h2bc3f7f_2 anaconda
[conda] magma-cuda110 2.5.2 1 pytorch
[conda] mkl 2022.0.1 h06a4308_117
[conda] mkl-include 2022.0.1 h06a4308_117
[conda] numpy 1.22.3 py39h7a5d4dd_0
[conda] numpy-base 1.22.3 py39hb8be1f0_0

cc @malfet @seemethere

Related to pytorch/FBGEMM#1094. Looks like a GCC 12 regression which particularly hits AMD CPUs.

Hi otioss, thanks for the report! If you have an idea of how to fix this, we would accept a patch that fixes this

So I had similar problems with building pytorch on AMD cpu/arch linux 5.17/gcc12.x
My workaround was to installl gcc 11.3 (sudo pacman -Sy gcc-11) then

export CC=gcc-11
export CXX=g++-11

then you can run python setup.py install in the usual way to build pytorch
to install to conda I did:

python setup.py bdist_wheel
pip install .
commented

I used the community/gcc11 package instead of aur/gcc-11. Also needed to do a new checkout to clear all the generated cmake/etc files.

This also affects GCC 13.0.0.

AMD CPUs.

This also affects "Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz" with "gcc (Ubuntu 12.2.0-3ubuntu1) 12.2.0"

Has anyone submitted a bug report to gcc or anyone know of the status of potential fixes for gcc?

Apparently a fix just made it to gcc-trunk today https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593

from my brief reading of the bug, it sounded like the warning is a false-positive, so if we can tell the compiler to continue we should be fine?

so, I passed compiler flags -Wno-maybe-uninitialized -Wno-uninitialized:

CXXFLAGS='-Wno-maybe-uninitialized -Wno-uninitialized' CFLAGS='-Wno-maybe-uninitialized -Wno-uninitialized' USE_ROCM=0 TORCH_CUDA_ARCH_LIST=8.9 PATH="$CUDA_DIR/bin:$PATH" LD_LIBRARY_PATH=$CUDA_DIR/lib64 python setup.py develop

this got further, but failed on error: ‘void operator delete(void*)’ called on pointer ‘<unknown>’ with nonzero offset:

FAILED: third_party/ideep/mkl-dnn/src/backend/dnnl/CMakeFiles/dnnl_graph_backend_dnnl.dir/dnnl_backend.cpp.o 
/usr/bin/c++ -DDNNL_GRAPH_CPU_RUNTIME=2 -DIDEEP_USE_MKL -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -D__STDC_CONSTANT_MACROS -D__STDC_LIMIT_MACROS -I/home/birch/git/pytorch/cmake/../third_party/benchmark/include -I/home/birch/git/pytorch/third_party/onnx -I/home/birch/git/pytorch/build/third_party/onnx -I/home/birch/git/pytorch/third_party/foxi -I/home/birch/git/pytorch/build/third_party/foxi -I/home/birch/git/pytorch/third_party/ideep/mkl-dnn/include -I/home/birch/git/pytorch/third_party/ideep/mkl-dnn/src -I/home/birch/git/pytorch/third_party/ideep/mkl-dnn/third_party/oneDNN/include -I/home/birch/git/pytorch/build/third_party/ideep/mkl-dnn/third_party/oneDNN/include -isystem /home/birch/git/pytorch/build/third_party/gloo -isystem /home/birch/git/pytorch/cmake/../third_party/gloo -isystem /home/birch/git/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /home/birch/git/pytorch/cmake/../third_party/googletest/googletest/include -isystem /home/birch/git/pytorch/third_party/protobuf/src -isystem /home/birch/anaconda3/envs/p310-cu121/include -isystem /home/birch/git/pytorch/third_party/gemmlowp -isystem /home/birch/git/pytorch/third_party/neon2sse -isystem /home/birch/git/pytorch/third_party/XNNPACK/include -isystem /home/birch/git/pytorch/third_party/ittapi/include -isystem /home/birch/git/pytorch/cmake/../third_party/eigen -isystem /usr/local/cuda-12.1/include -Wno-maybe-uninitialized -Wno-uninitialized -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -std=c++11 -fopenmp -fvisibility-inlines-hidden  -Wall -Werror -Wno-unknown-pragmas -fvisibility=internal -fPIC -Wformat -Wformat-security -fstack-protector-strong   -Wmissing-field-initializers  -Wno-strict-overflow -O3 -DNDEBUG -DNDEBUG -D_FORTIFY_SOURCE=2 -fPIC -fvisibility=hidden -fvisibility-inlines-hidden -DCAFFE2_USE_GLOO -MD -MT third_party/ideep/mkl-dnn/src/backend/dnnl/CMakeFiles/dnnl_graph_backend_dnnl.dir/dnnl_backend.cpp.o -MF third_party/ideep/mkl-dnn/src/backend/dnnl/CMakeFiles/dnnl_graph_backend_dnnl.dir/dnnl_backend.cpp.o.d -o third_party/ideep/mkl-dnn/src/backend/dnnl/CMakeFiles/dnnl_graph_backend_dnnl.dir/dnnl_backend.cpp.o -c /home/birch/git/pytorch/third_party/ideep/mkl-dnn/src/backend/dnnl/dnnl_backend.cpp
In file included from /usr/include/x86_64-linux-gnu/c++/12/bits/c++allocator.h:33,
                 from /usr/include/c++/12/bits/allocator.h:46,
                 from /usr/include/c++/12/memory:64,
                 from /home/birch/git/pytorch/third_party/ideep/mkl-dnn/src/utils/compatible.hpp:23,
                 from /home/birch/git/pytorch/third_party/ideep/mkl-dnn/src/backend/dnnl/dnnl_backend.cpp:19:
In member function ‘void std::__new_allocator<_Tp>::deallocate(_Tp*, size_type) [with _Tp = long int]’,
    inlined from ‘static void std::allocator_traits<std::allocator<_Tp1> >::deallocate(allocator_type&, pointer, size_type) [with _Tp = long int]’ at /usr/include/c++/12/bits/alloc_traits.h:496:23,
    inlined from ‘void std::_Vector_base<_Tp, _Alloc>::_M_deallocate(pointer, std::size_t) [with _Tp = long int; _Alloc = std::allocator<long int>]’ at /usr/include/c++/12/bits/stl_vector.h:387:19,
    inlined from ‘std::_Vector_base<_Tp, _Alloc>::~_Vector_base() [with _Tp = long int; _Alloc = std::allocator<long int>]’ at /usr/include/c++/12/bits/stl_vector.h:366:15,
    inlined from ‘std::vector<_Tp, _Alloc>::~vector() [with _Tp = long int; _Alloc = std::allocator<long int>]’ at /usr/include/c++/12/bits/stl_vector.h:733:7,
    inlined from ‘virtual void dnnl::graph::impl::dnnl_impl::bn_folding_t::execute(const dnnl::stream&, const std::unordered_map<int, dnnl::memory>&) const’ at /home/birch/git/pytorch/third_party/ideep/mkl-dnn/src/backend/dnnl/op_executable.hpp:1060:63:
/usr/include/c++/12/bits/new_allocator.h:158:33: error: ‘void operator delete(void*)’ called on pointer ‘<unknown>’ with nonzero offset [1, 9223372036854775800] [-Werror=free-nonheap-object]
  158 |         _GLIBCXX_OPERATOR_DELETE(_GLIBCXX_SIZED_DEALLOC(__p, __n));
      |                                 ^
In member function ‘_Tp* std::__new_allocator<_Tp>::allocate(size_type, const void*) [with _Tp = long int]’,
    inlined from ‘static _Tp* std::allocator_traits<std::allocator<_Tp1> >::allocate(allocator_type&, size_type) [with _Tp = long int]’ at /usr/include/c++/12/bits/alloc_traits.h:464:28,
    inlined from ‘std::_Vector_base<_Tp, _Alloc>::pointer std::_Vector_base<_Tp, _Alloc>::_M_allocate(std::size_t) [with _Tp = long int; _Alloc = std::allocator<long int>]’ at /usr/include/c++/12/bits/stl_vector.h:378:33,
    inlined from ‘void std::vector<_Tp, _Alloc>::_M_range_initialize(_ForwardIterator, _ForwardIterator, std::forward_iterator_tag) [with _ForwardIterator = const long int*; _Tp = long int; _Alloc = std::allocator<long int>]’ at /usr/include/c++/12/bits/stl_vector.h:1687:25,
    inlined from ‘std::vector<_Tp, _Alloc>::vector(_InputIterator, _InputIterator, const allocator_type&) [with _InputIterator = const long int*; <template-parameter-2-2> = void; _Tp = long int; _Alloc = std::allocator<long int>]’ at /usr/include/c++/12/bits/stl_vector.h:706:23,
    inlined from ‘dnnl::memory::dims dnnl::memory::desc::dims() const’ at /home/birch/git/pytorch/third_party/ideep/mkl-dnn/third_party/oneDNN/include/oneapi/dnnl/dnnl.hpp:2677:66,
    inlined from ‘virtual void dnnl::graph::impl::dnnl_impl::bn_folding_t::execute(const dnnl::stream&, const std::unordered_map<int, dnnl::memory>&) const’ at /home/birch/git/pytorch/third_party/ideep/mkl-dnn/src/backend/dnnl/op_executable.hpp:1060:63:
/usr/include/c++/12/bits/new_allocator.h:137:55: note: returned from ‘void* operator new(std::size_t)’
  137 |         return static_cast<_Tp*>(_GLIBCXX_OPERATOR_NEW(__n * sizeof(_Tp)));
      |                                                       ^
cc1plus: all warnings being treated as errors

well, maybe I'm wrong that it's an ignorable warning. 🙃

gets a bit further if I ignore free-nonheap-object warnings:

CXXFLAGS='-Wno-maybe-uninitialized -Wno-uninitialized -Wno-free-nonheap-object' CFLAGS='-Wno-maybe-uninitialized -Wno-uninitialized -Wno-free-nonheap-object' 

next failure is:

[6128/7025] Building CXX object test_api/CMakeFiles/test_api.dir/dataloader.cpp.o
FAILED: test_api/CMakeFiles/test_api.dir/dataloader.cpp.o 
/usr/bin/c++ -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_CUDA -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_RPC -DUSE_TENSORPIPE -D_FILE_OFFSET_BITS=64 -I/home/birch/git/pytorch/build/aten/src -I/home/birch/git/pytorch/aten/src -I/home/birch/git/pytorch/build -I/home/birch/git/pytorch -I/home/birch/git/pytorch/cmake/../third_party/benchmark/include -I/home/birch/git/pytorch/third_party/onnx -I/home/birch/git/pytorch/build/third_party/onnx -I/home/birch/git/pytorch/third_party/foxi -I/home/birch/git/pytorch/build/third_party/foxi -I/home/birch/git/pytorch/build/caffe2/../aten/src -I/home/birch/git/pytorch/torch/csrc/api -I/home/birch/git/pytorch/torch/csrc/api/include -I/home/birch/git/pytorch/c10/.. -I/home/birch/git/pytorch/c10/cuda/../.. -isystem /home/birch/git/pytorch/build/third_party/gloo -isystem /home/birch/git/pytorch/cmake/../third_party/gloo -isystem /home/birch/git/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /home/birch/git/pytorch/cmake/../third_party/googletest/googletest/include -isystem /home/birch/git/pytorch/third_party/protobuf/src -isystem /home/birch/anaconda3/envs/p310-cu121/include -isystem /home/birch/git/pytorch/third_party/gemmlowp -isystem /home/birch/git/pytorch/third_party/neon2sse -isystem /home/birch/git/pytorch/third_party/XNNPACK/include -isystem /home/birch/git/pytorch/third_party/ittapi/include -isystem /home/birch/git/pytorch/cmake/../third_party/eigen -isystem /usr/local/cuda-12.1/include -isystem /home/birch/git/pytorch/third_party/ideep/mkl-dnn/third_party/oneDNN/include -isystem /home/birch/git/pytorch/third_party/ideep/include -isystem /home/birch/git/pytorch/third_party/ideep/mkl-dnn/include -isystem /home/birch/git/pytorch/third_party/googletest/googletest/include -isystem /home/birch/git/pytorch/third_party/googletest/googletest -Wno-maybe-uninitialized -Wno-uninitialized -Wno-free-nonheap-object -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow -DHAVE_AVX512_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -fPIE -DCAFFE2_USE_GLOO -DTH_HAVE_THREAD -Wno-unused-variable -Wno-missing-braces -Wno-maybe-uninitialized -Wno-unused-but-set-parameter -MD -MT test_api/CMakeFiles/test_api.dir/dataloader.cpp.o -MF test_api/CMakeFiles/test_api.dir/dataloader.cpp.o.d -o test_api/CMakeFiles/test_api.dir/dataloader.cpp.o -c /home/birch/git/pytorch/test/cpp/api/dataloader.cpp
In file included from /usr/include/c++/12/memory:63,
                 from /home/birch/git/pytorch/third_party/googletest/googletest/include/gtest/gtest.h:57,
                 from /home/birch/git/pytorch/test/cpp/api/dataloader.cpp:1:
In static member function ‘static _Tp* std::__copy_move<_IsMove, true, std::random_access_iterator_tag>::__copy_m(const _Tp*, const _Tp*, _Tp*) [with _Tp = long unsigned int; bool _IsMove = false]’,
    inlined from ‘_OI std::__copy_move_a2(_II, _II, _OI) [with bool _IsMove = false; _II = const long unsigned int*; _OI = long unsigned int*]’ at /usr/include/c++/12/bits/stl_algobase.h:495:30,
    inlined from ‘_OI std::__copy_move_a1(_II, _II, _OI) [with bool _IsMove = false; _II = const long unsigned int*; _OI = long unsigned int*]’ at /usr/include/c++/12/bits/stl_algobase.h:522:42,
    inlined from ‘_OI std::__copy_move_a(_II, _II, _OI) [with bool _IsMove = false; _II = __gnu_cxx::__normal_iterator<const long unsigned int*, vector<long unsigned int> >; _OI = __gnu_cxx::__normal_iterator<long unsigned int*, vector<long unsigned int> >]’ at /usr/include/c++/12/bits/stl_algobase.h:529:31,
    inlined from ‘_OI std::copy(_II, _II, _OI) [with _II = __gnu_cxx::__normal_iterator<const long unsigned int*, vector<long unsigned int> >; _OI = __gnu_cxx::__normal_iterator<long unsigned int*, vector<long unsigned int> >]’ at /usr/include/c++/12/bits/stl_algobase.h:620:7,
    inlined from ‘std::vector<_Tp, _Alloc>& std::vector<_Tp, _Alloc>::operator=(const std::vector<_Tp, _Alloc>&) [with _Tp = long unsigned int; _Alloc = std::allocator<long unsigned int>]’ at /usr/include/c++/12/bits/vector.tcc:244:21:
/usr/include/c++/12/bits/stl_algobase.h:431:30: error: argument 1 null where non-null expected [-Werror=nonnull]
  431 |             __builtin_memmove(__result, __first, sizeof(_Tp) * _Num);
      |             ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/c++/12/bits/stl_algobase.h:431:30: note: in a call to built-in function ‘void* __builtin_memmove(void*, const void*, long unsigned int)’
At global scope:
cc1plus: note: unrecognized command-line option ‘-Wno-aligned-allocation-unavailable’ may have been intended to silence earlier diagnostics
cc1plus: note: unrecognized command-line option ‘-Wno-unused-private-field’ may have been intended to silence earlier diagnostics
cc1plus: note: unrecognized command-line option ‘-Wno-invalid-partial-specialization’ may have been intended to silence earlier diagnostics
cc1plus: some warnings being treated as errors

on the basis that that file is part of a test_api: I figured I could live with ignoring the warnings.

told pytorch to not to treat nonnull warnings as errors.

successfully compiled PyTorch 2.1.0 commit b8d7a28 with CUDA 12.1.1, gcc 12.2.0, Ubuntu 22.10 on Ryzen 7700X.

the final command I used was:

CUDA_DIR=/usr/local/cuda-12.1
CXXFLAGS='-Wno-maybe-uninitialized -Wno-uninitialized -Wno-free-nonheap-object -Wno-nonnull' CFLAGS='-Wno-maybe-uninitialized -Wno-uninitialized -Wno-free-nonheap-object -Wno-nonnull' USE_ROCM=0 TORCH_CUDA_ARCH_LIST=8.9 PATH="$CUDA_DIR/bin:$PATH" LD_LIBRARY_PATH=$CUDA_DIR/lib64 python setup.py develop

full instructions of how I ran, here:
https://gist.github.com/Birch-san/211f31f8d901dadd1025398fa1a603b8