Segfault in CachingDeviceAllocator when out of memory

Question

Segfault in CachingDeviceAllocator when out of memory

orjgre opened this issue a year ago · comments

Hi,
When allocating more memory than available through the CachingDeviceAllocator it segfaults when using default constructed allocator. When using a different bin parameters it returns the correct error code. See below for example:

#include <iostream>
#include <cuda_runtime_api.h>
#include <sstream>
#include <cub/util_allocator.cuh>
#include <vector>

void exitOnFailure(enum cudaError errCode, const std::string &errString) {
    if (errCode != cudaSuccess) {
        std::stringstream fatalStream;
        fatalStream << errString << " " << errCode << " - " << cudaGetErrorString(errCode);
        throw std::runtime_error(fatalStream.str());
    }
}

int main() {
    enum cudaError error_id;
    cudaDeviceProp deviceProp;
    const int dev = 0;
    exitOnFailure(cudaSetDevice(dev), "Setting CUDA device resulted in an error!");
    exitOnFailure(cudaGetDeviceProperties(&deviceProp, dev), "Getting device properties resulted in an error!");

    size_t size = deviceProp.totalGlobalMem;

    std::cout << "Heap: " << size << std::endl;

//    cub::CachingDeviceAllocator allocator{4, 3, 15};  // This works fine
    cub::CachingDeviceAllocator allocator{}; // This segfaults

    size_t free = size;
    std::vector<uint8_t *> ptrs;
    auto allocate = [&allocator, &free, &ptrs](size_t bytes) {
        ptrs.push_back({});
        auto *ptr = ptrs.back();
        exitOnFailure(allocator.DeviceAllocate((void **) &ptrs.back(), bytes), "malloc");
        if (bytes > free) {
            std::cout << "FULL!\n";
        }
        free -= bytes;
        std::cout << "Allocated " << bytes << ", free: " << free << std::endl;
    };

    allocate(7000000000);
    allocate(70000000);
    allocate(500000000);
    allocate(50000000);
    allocate(40);
    allocate(40);
    allocate(218850000);
    for (int i = 1; i < 7; i++) {
        allocator.DeviceFree(ptrs.at(i));
    }
    allocate(40);
    for (int i = 0; i < 10000000; i++) {
        allocate(300000);
    }

    getchar();
    return 0;
}

Georgii Evtushenko · Answer 1 · Mon Jun 26 2023 02:34:55 GMT+0800 (China Standard Time)

Hello @orjgre and thank you for reporting this!

The issue happens when the allocator is trying to clean up some cached blocks to fit the new allocation. In this case, we are trying to increment an iterator after erasing it, which is not right:

cached_blocks.erase(block_itr);
block_itr++;

It should've been:

block_itr = cached_blocks.erase(block_itr);

I'll create a PR soon.

Ørjan Grefstad · Answer 2 · Mon Jul 17 2023 16:56:16 GMT+0800 (China Standard Time)

@senior-zero Thank you for fixing this. Do you know which release this will be included in?

Georgii Evtushenko · Answer 3 · Mon Jul 17 2023 17:00:55 GMT+0800 (China Standard Time)

@orjgre the fix is merged so it targets our next release: 2.2.0.

Georgii Evtushenko · Answer 4 · Thu Aug 10 2023 00:36:16 GMT+0800 (China Standard Time)

@orjgre we are migrating to a different repo. The fix is merged in NVIDIA/cccl#119. I'm closing the issue.