Zero length allocation failure

Question

Zero length allocation failure

agerasev opened this issue 7 months ago · comments

Hi!

I'm facing an issue with zero length memory allocation (while trying to run candle on GTX 970). Here is the minimal reproducer:

let dev = cudarc::driver::CudaDevice::new(0).unwrap();
dev.null::<f32>().unwrap();

On my machine it fails with DriverError(CUDA_ERROR_INVALID_VALUE, "invalid argument"). With this workaround it works fine.

I didn't find documentation for cuMemAlloc_v2 but for cuMemAlloc it says:

If bytesize is 0, cuMemAlloc() returns CUDA_ERROR_INVALID_VALUE

Maybe cuMemAlloc_v2 shouldn't be called at all if num_bytes is zero?

Alexey Gerasev · Answer 1 · Mon Dec 04 2023 14:59:08 GMT+0800 (China Standard Time)

My system:

$ uname -a
Linux  6.1.0-13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.55-1 (2023-09-29) x86_64 GNU/Linux

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

$ nvidia-smi -q

==============NVSMI LOG==============

Timestamp                                 : Mon Dec  4 13:40:18 2023
Driver Version                            : 525.125.06
CUDA Version                              : 12.0

Attached GPUs                             : 1
GPU 00000000:03:00.0
    Product Name                          : NVIDIA GeForce GTX 970
    Product Brand                         : GeForce
    Product Architecture                  : Maxwell
    Display Mode                          : Enabled
    Display Active                        : Enabled
    Persistence Mode                      : Enabled
...

Corey Lowman · Answer 2 · Tue Jan 09 2024 09:15:19 GMT+0800 (China Standard Time)

I think this function is behaving as it should - it's returning a result (and the unwrap turns it into a panic). I think this should probably be raised as an issue on candle's repo. Do you know where in candle it's coming from?

Alexey Gerasev · Answer 3 · Tue Jan 09 2024 13:22:21 GMT+0800 (China Standard Time)

Do you know where in candle it's coming from?

It can occur in many places in candle_core::cuda_backend where alloc or htod_copy called. There is no checks for zero length here, they are assumed to be successful.

I think this function is behaving as it should - it's returning a result (and the unwrap turns it into a panic).

The problem is that this behavior is inconsistent - it seems that on most devices zero allocation succeeds (and candle relies on this) but on GTX 970 it fails.

Corey Lowman · Answer 4 · Tue Jan 09 2024 22:26:24 GMT+0800 (China Standard Time)

I'm not really sure what we can do in this case - this seems like a driver level issue. We don't have any device specific code in cudarc, so I guess I'm not sure what the outcome should be. I'm hesitant to use a null pointer (i.e. not actually call cuMalloc) because I don't really know what the downstream effect of that would be or how the cuda driver interacts with all of those.

Can you print out the CudaDevice in your example? I want to see if the is_async is false

let dev = cudarc::driver::CudaDevice::new(0).unwrap();
println!("{:?}", dev);

Alexey Gerasev · Answer 5 · Wed Jan 10 2024 11:21:09 GMT+0800 (China Standard Time)

Can you print out the CudaDevice in your example?

CudaDevice {
    cu_device: 0,
    cu_primary_ctx: 0x000055759b945ec0,
    stream: 0x0000000000000000,
    event: 0x000055759bc8d4f0,
    modules: RwLock {
        data: {},
        poisoned: false,
        ..
    },
    ordinal: 0,
    is_async: false,
}