pytorch / kineto

A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ninja build failed by CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernelExC_v11060

jp1924 opened this issue · comments

commented

hi pytorch team,

When i build the pytorch from source, i encountered the following error.

FAILED: third_party/kineto/libkineto/CMakeFiles/kineto_base.dir/src/CuptiActivityProfiler.cpp.o 
/usr/bin/c++ -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_EXTERNAL_MZCRC -D_FILE_OFFSET_BITS=64 -I/root/pytorch/cmake/../third_party/benchmark/include -I/root/pytorch/third_party/onnx -I/root/pytorch/build/third_party/onnx -I/root/pytorch/third_party/foxi -I/root/pytorch/build/third_party/foxi -I/root/pytorch/third_party/kineto/libkineto/include -I/root/pytorch/third_party/kineto/libkineto/src -I/root/pytorch/third_party/kineto/libkineto/third_party/dynolog -I/root/pytorch/third_party/fmt/include -I/root/pytorch/third_party/kineto/libkineto/third_party/dynolog/dynolog/src/ipcfabric -I/include/roctracer -I/opt/rocm/include -isystem /root/pytorch/build/third_party/gloo -isystem /root/pytorch/cmake/../third_party/gloo -isystem /root/pytorch/cmake/../third_party/tensorpipe/third_party/libuv/include -isystem /root/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /root/pytorch/cmake/../third_party/googletest/googletest/include -isystem /root/pytorch/third_party/protobuf/src -isystem /root/miniconda3/include -isystem /root/pytorch/third_party/gemmlowp -isystem /root/pytorch/third_party/neon2sse -isystem /root/pytorch/third_party/XNNPACK/include -isystem /root/pytorch/third_party/ittapi/include -isystem /root/pytorch/cmake/../third_party/eigen -isystem /usr/local/cuda/include -isystem /root/pytorch/third_party/ideep/mkl-dnn/include/oneapi/dnnl -isystem /root/pytorch/third_party/ideep/include -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -O3 -DNDEBUG -DNDEBUG -std=c++17 -fPIC -DMKL_HAS_SBGEMM -DTORCH_USE_LIBUV -DCAFFE2_USE_GLOO -DTH_HAVE_THREAD -DKINETO_NAMESPACE=libkineto -DFMT_HEADER_ONLY -DENABLE_IPC_FABRIC -std=c++17 -DHAS_CUPTI -MD -MT third_party/kineto/libkineto/CMakeFiles/kineto_base.dir/src/CuptiActivityProfiler.cpp.o -MF third_party/kineto/libkineto/CMakeFiles/kineto_base.dir/src/CuptiActivityProfiler.cpp.o.d -o third_party/kineto/libkineto/CMakeFiles/kineto_base.dir/src/CuptiActivityProfiler.cpp.o -c /root/pytorch/third_party/kineto/libkineto/src/CuptiActivityProfiler.cpp
In file included from /root/pytorch/third_party/kineto/libkineto/src/CuptiActivityProfiler.cpp:36:
/root/pytorch/third_party/kineto/libkineto/src/CuptiActivity.cpp: In member function ‘virtual bool libkineto::RuntimeActivity::flowStart() const’:
/root/pytorch/third_party/kineto/libkineto/src/CuptiActivity.cpp:248:25: error: ‘CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernelExC_v11060’ was not declared in this scope; did you mean ‘CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernel_v7000’?
  248 |       activity_.cbid == CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernelExC_v11060;
       |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
       |                         CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernel_v7000

When I replace CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernelExC_v11060 with CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernel_v7000, Ninja operates normally.

Do you know why this is happening?

That code is modified in the following #788

Env

OS: 22.04
cuda: 11.7

this should be defined in a file called cupti_runtime_cbid.h somewhere in cuda includes. I think that it should be guarded by the #if defined(CUPTI_API_VERSION) && CUPTI_API_VERSION >= 17 on the previous line. Can you check whether the cupti_version.h matches? It might be possible that the CUPTI_API_VERSION should be increased or decreased by 1.

commented

Thanks for the answer @davidberard98!
I realized that in cupti_version.h, the CUPTI_API_VERSION is 17.

And is CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernelExC_v11060 defined in cupti_runtime_cbid.h ? You can compare to, say, https://gitlab.com/nvidia/headers/cuda-individual/cupti/-/blob/main/cupti_runtime_cbid.h?ref_type=heads - what's the last ID defined in your copy of cupti_runtime_cbid.h?

My understanding was that CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernelExC_v11060 should be available in versions >=17 - but maybe this isn't true?

I can confirm compiling torch (branch: release/2.1) will fail due to this issue when using CUDA 11.7 but works fine when compiling with CUDA 11.8.

commented

Sorry for the late reply! @davidberard98

I compared my cupti_runtime_cbid.h with the cupti_runtime_cbid.h in the CUPTI repo as you suggested, and they are different.
I think it's because my version of CUPTI is not up to date.

The environment I'm working in is in a container built on the nvidia/cuda:11.7.1-cudnn8-devel-ubuntu22.04 image, and while I'm working in the container, something is out of whack and I think I'm running an older version of CUPTI.

Thanks for the help!

this is my cupti_runtime_cbid.h

// *************************************************************************
//      Definitions of indices for API functions, unique across entire API
// *************************************************************************

// This file is generated.  Any changes you make will be lost during the next clean build.
// CUDA public interface, for type definitions and cu* function prototypes

typedef enum CUpti_runtime_api_trace_cbid_enum {
    CUPTI_RUNTIME_TRACE_CBID_INVALID                                                       = 0,
    CUPTI_RUNTIME_TRACE_CBID_cudaDriverGetVersion_v3020                                    = 1,
    CUPTI_RUNTIME_TRACE_CBID_cudaRuntimeGetVersion_v3020                                   = 2,
    CUPTI_RUNTIME_TRACE_CBID_cudaGetDeviceCount_v3020                                      = 3,
    CUPTI_RUNTIME_TRACE_CBID_cudaGetDeviceProperties_v3020                                 = 4,
    
    ...

    CUPTI_RUNTIME_TRACE_CBID_cudaGraphNodeSetEnabled_v11060                                = 426,
    CUPTI_RUNTIME_TRACE_CBID_cudaGraphNodeGetEnabled_v11060                                = 427,
    CUPTI_RUNTIME_TRACE_CBID_cudaArrayGetMemoryRequirements_v11060                         = 428,
    CUPTI_RUNTIME_TRACE_CBID_cudaMipmappedArrayGetMemoryRequirements_v11060                = 429,
    CUPTI_RUNTIME_TRACE_CBID_SIZE                                                          = 430,
    CUPTI_RUNTIME_TRACE_CBID_FORCE_INT                                                     = 0x7fffffff
} CUpti_runtime_api_trace_cbid;

CUpti_runtime_api_trace_cbid_enum is only up to 430.

@davidberard98 FYI enum>430 is only added after CUDA 11.8 (diffs), so the dependency here blocks newer PyTorch builds on CUDA 11.7.

@xflash96 do you know the relationship between cuda version, cupti version, and number-of-enum-values? The reason we changed this to check cupti version instead of cuda version is because we had reports of people with cuda version 11.8 and a very old cupti version that didn't have this enum; we thought that cupti version would more accurately correspond to the available enum values, but it seems like this isn't the case..

@davidberard98 For CUPTI versioning vs CUDA[toolkit] versioning, v17 corresponding to CUDA 11.6 according to cupti_version.h. The enum should be increased wrt the toolkit if the toolkit is installed as a set. I guess most CI depends on NVidia's docker image for CUDA versioning, so it might be serve as a ground truth for what's included.. At least the current method doesn't work for the NVidia CUDA 11.7.1 docker image.

#809 is changing it to v18

seems like the main branch of PyTorch still couldn't build without manually changing /home/user/pytorch/third_party/kineto/libkineto/src/CuptiActivity.cpp:247 to 18. I'm using CUDA 12.0.

kineto pin hasn't been updated in pytorch yet