ninja build failed by CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernelExC_v11060
jp1924 opened this issue · comments
hi pytorch team,
When i build the pytorch from source, i encountered the following error.
FAILED: third_party/kineto/libkineto/CMakeFiles/kineto_base.dir/src/CuptiActivityProfiler.cpp.o
/usr/bin/c++ -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -I/root/pytorch/cmake/../third_party/benchmark/include -I/root/pytorch/third_party/onnx -I/root/pytorch/build/third_party/onnx -I/root/pytorch/third_party/foxi -I/root/pytorch/build/third_party/foxi -I/root/pytorch/third_party/kineto/libkineto/include -I/root/pytorch/third_party/kineto/libkineto/src -I/root/pytorch/third_party/kineto/libkineto/third_party/dynolog -I/root/pytorch/third_party/fmt/include -I/root/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric -I/include/roctracer -I/opt/rocm/include -isystem /root/pytorch/build/third_party/gloo -isystem /root/pytorch/cmake/../third_party/gloo -isystem /root/pytorch/cmake/../third_party/tensorpipe/third_party/libuv/include -isystem /root/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /root/pytorch/cmake/../third_party/googletest/googletest/include -isystem /root/pytorch/third_party/protobuf/src -isystem /root/miniconda3/include -isystem /root/pytorch/third_party/gemmlowp -isystem /root/pytorch/third_party/neon2sse -isystem /root/pytorch/third_party/XNNPACK/include -isystem /root/pytorch/third_party/ittapi/include -isystem /root/pytorch/cmake/../third_party/eigen -isystem /usr/local/cuda/include -isystem /root/pytorch/third_party/ideep/mkl-dnn/include/oneapi/dnnl -isystem /root/pytorch/third_party/ideep/include -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -O3 -DNDEBUG -DNDEBUG -std=c++17 -fPIC -DMKL_HAS_SBGEMM -DTORCH_USE_LIBUV -DCAFFE2_USE_GLOO -DTH_HAVE_THREAD -DKINETO_NAMESPACE=libkineto -DFMT_HEADER_ONLY -DENABLE_IPC_FABRIC -std=c++17 -DHAS_CUPTI -MD -MT third_party/kineto/libkineto/CMakeFiles/kineto_base.dir/src/CuptiActivityProfiler.cpp.o -MF third_party/kineto/libkineto/CMakeFiles/kineto_base.dir/src/CuptiActivityProfiler.cpp.o.d -o third_party/kineto/libkineto/CMakeFiles/kineto_base.dir/src/CuptiActivityProfiler.cpp.o -c /root/pytorch/third_party/kineto/libkineto/src/CuptiActivityProfiler.cpp
In file included from /root/pytorch/third_party/kineto/libkineto/src/CuptiActivityProfiler.cpp:36:
/root/pytorch/third_party/kineto/libkineto/src/CuptiActivity.cpp: In member function ‘virtual bool libkineto::RuntimeActivity::flowStart() const’:
/root/pytorch/third_party/kineto/libkineto/src/CuptiActivity.cpp:248:25: error: ‘CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernelExC_v11060’ was not declared in this scope; did you mean ‘CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernel_v7000’?
248 | activity_.cbid == CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernelExC_v11060;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernel_v7000
When I replace CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernelExC_v11060 with CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernel_v7000, Ninja operates normally.
Do you know why this is happening?
That code is modified in the following #788
Env
OS: 22.04
cuda: 11.7
this should be defined in a file called cupti_runtime_cbid.h
somewhere in cuda includes. I think that it should be guarded by the #if defined(CUPTI_API_VERSION) && CUPTI_API_VERSION >= 17
on the previous line. Can you check whether the cupti_version.h
matches? It might be possible that the CUPTI_API_VERSION should be increased or decreased by 1.
Thanks for the answer @davidberard98!
I realized that in cupti_version.h
, the CUPTI_API_VERSION
is 17.
And is CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernelExC_v11060
defined in cupti_runtime_cbid.h
? You can compare to, say, https://gitlab.com/nvidia/headers/cuda-individual/cupti/-/blob/main/cupti_runtime_cbid.h?ref_type=heads - what's the last ID defined in your copy of cupti_runtime_cbid.h
?
My understanding was that CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernelExC_v11060
should be available in versions >=17 - but maybe this isn't true?
I can confirm compiling torch (branch: release/2.1) will fail due to this issue when using CUDA 11.7 but works fine when compiling with CUDA 11.8.
Sorry for the late reply! @davidberard98
I compared my cupti_runtime_cbid.h
with the cupti_runtime_cbid.h
in the CUPTI repo as you suggested, and they are different.
I think it's because my version of CUPTI is not up to date.
The environment I'm working in is in a container built on the nvidia/cuda:11.7.1-cudnn8-devel-ubuntu22.04
image, and while I'm working in the container, something is out of whack and I think I'm running an older version of CUPTI.
Thanks for the help!
this is my cupti_runtime_cbid.h
// *************************************************************************
// Definitions of indices for API functions, unique across entire API
// *************************************************************************
// This file is generated. Any changes you make will be lost during the next clean build.
// CUDA public interface, for type definitions and cu* function prototypes
typedef enum CUpti_runtime_api_trace_cbid_enum {
CUPTI_RUNTIME_TRACE_CBID_INVALID = 0,
CUPTI_RUNTIME_TRACE_CBID_cudaDriverGetVersion_v3020 = 1,
CUPTI_RUNTIME_TRACE_CBID_cudaRuntimeGetVersion_v3020 = 2,
CUPTI_RUNTIME_TRACE_CBID_cudaGetDeviceCount_v3020 = 3,
CUPTI_RUNTIME_TRACE_CBID_cudaGetDeviceProperties_v3020 = 4,
...
CUPTI_RUNTIME_TRACE_CBID_cudaGraphNodeSetEnabled_v11060 = 426,
CUPTI_RUNTIME_TRACE_CBID_cudaGraphNodeGetEnabled_v11060 = 427,
CUPTI_RUNTIME_TRACE_CBID_cudaArrayGetMemoryRequirements_v11060 = 428,
CUPTI_RUNTIME_TRACE_CBID_cudaMipmappedArrayGetMemoryRequirements_v11060 = 429,
CUPTI_RUNTIME_TRACE_CBID_SIZE = 430,
CUPTI_RUNTIME_TRACE_CBID_FORCE_INT = 0x7fffffff
} CUpti_runtime_api_trace_cbid;
CUpti_runtime_api_trace_cbid_enum
is only up to 430.
@davidberard98 FYI enum>430 is only added after CUDA 11.8 (diffs), so the dependency here blocks newer PyTorch builds on CUDA 11.7.
@xflash96 do you know the relationship between cuda version, cupti version, and number-of-enum-values? The reason we changed this to check cupti version instead of cuda version is because we had reports of people with cuda version 11.8 and a very old cupti version that didn't have this enum; we thought that cupti version would more accurately correspond to the available enum values, but it seems like this isn't the case..
@davidberard98 For CUPTI versioning vs CUDA[toolkit] versioning, v17 corresponding to CUDA 11.6 according to cupti_version.h. The enum should be increased wrt the toolkit if the toolkit is installed as a set. I guess most CI depends on NVidia's docker image for CUDA versioning, so it might be serve as a ground truth for what's included.. At least the current method doesn't work for the NVidia CUDA 11.7.1 docker image.
#809 is changing it to v18
seems like the main branch of PyTorch still couldn't build without manually changing /home/user/pytorch/third_party/kineto/libkineto/src/CuptiActivity.cpp:247 to 18. I'm using CUDA 12.0.
kineto pin hasn't been updated in pytorch yet