NVIDIA / cub

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Segfault in CachingDeviceAllocator when out of memory

orjgre opened this issue · comments

Hi,
When allocating more memory than available through the CachingDeviceAllocator it segfaults when using default constructed allocator. When using a different bin parameters it returns the correct error code. See below for example:

#include <iostream>
#include <cuda_runtime_api.h>
#include <sstream>
#include <cub/util_allocator.cuh>
#include <vector>

void exitOnFailure(enum cudaError errCode, const std::string &errString) {
    if (errCode != cudaSuccess) {
        std::stringstream fatalStream;
        fatalStream << errString << " " << errCode << " - " << cudaGetErrorString(errCode);
        throw std::runtime_error(fatalStream.str());
    }
}

int main() {
    enum cudaError error_id;
    cudaDeviceProp deviceProp;
    const int dev = 0;
    exitOnFailure(cudaSetDevice(dev), "Setting CUDA device resulted in an error!");
    exitOnFailure(cudaGetDeviceProperties(&deviceProp, dev), "Getting device properties resulted in an error!");

    size_t size = deviceProp.totalGlobalMem;

    std::cout << "Heap: " << size << std::endl;

//    cub::CachingDeviceAllocator allocator{4, 3, 15};  // This works fine
    cub::CachingDeviceAllocator allocator{}; // This segfaults

    size_t free = size;
    std::vector<uint8_t *> ptrs;
    auto allocate = [&allocator, &free, &ptrs](size_t bytes) {
        ptrs.push_back({});
        auto *ptr = ptrs.back();
        exitOnFailure(allocator.DeviceAllocate((void **) &ptrs.back(), bytes), "malloc");
        if (bytes > free) {
            std::cout << "FULL!\n";
        }
        free -= bytes;
        std::cout << "Allocated " << bytes << ", free: " << free << std::endl;
    };

    allocate(7000000000);
    allocate(70000000);
    allocate(500000000);
    allocate(50000000);
    allocate(40);
    allocate(40);
    allocate(218850000);
    for (int i = 1; i < 7; i++) {
        allocator.DeviceFree(ptrs.at(i));
    }
    allocate(40);
    for (int i = 0; i < 10000000; i++) {
        allocate(300000);
    }

    getchar();
    return 0;
}

Hello @orjgre and thank you for reporting this!

The issue happens when the allocator is trying to clean up some cached blocks to fit the new allocation. In this case, we are trying to increment an iterator after erasing it, which is not right:

cached_blocks.erase(block_itr);
block_itr++;

It should've been:

block_itr = cached_blocks.erase(block_itr);

I'll create a PR soon.

@senior-zero Thank you for fixing this. Do you know which release this will be included in?

@orjgre the fix is merged so it targets our next release: 2.2.0.

@orjgre we are migrating to a different repo. The fix is merged in NVIDIA/cccl#119. I'm closing the issue.