coreylowman / cudarc

Safe rust wrapper around CUDA toolkit

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Zero length allocation failure

agerasev opened this issue · comments

Hi!

I'm facing an issue with zero length memory allocation (while trying to run candle on GTX 970). Here is the minimal reproducer:

let dev = cudarc::driver::CudaDevice::new(0).unwrap();
dev.null::<f32>().unwrap();

On my machine it fails with DriverError(CUDA_ERROR_INVALID_VALUE, "invalid argument"). With this workaround it works fine.

I didn't find documentation for cuMemAlloc_v2 but for cuMemAlloc it says:

If bytesize is 0, cuMemAlloc() returns CUDA_ERROR_INVALID_VALUE

Maybe cuMemAlloc_v2 shouldn't be called at all if num_bytes is zero?

My system:

$ uname -a
Linux  6.1.0-13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.55-1 (2023-09-29) x86_64 GNU/Linux
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
$ nvidia-smi -q

==============NVSMI LOG==============

Timestamp                                 : Mon Dec  4 13:40:18 2023
Driver Version                            : 525.125.06
CUDA Version                              : 12.0

Attached GPUs                             : 1
GPU 00000000:03:00.0
    Product Name                          : NVIDIA GeForce GTX 970
    Product Brand                         : GeForce
    Product Architecture                  : Maxwell
    Display Mode                          : Enabled
    Display Active                        : Enabled
    Persistence Mode                      : Enabled
...

I think this function is behaving as it should - it's returning a result (and the unwrap turns it into a panic). I think this should probably be raised as an issue on candle's repo. Do you know where in candle it's coming from?

Do you know where in candle it's coming from?

It can occur in many places in candle_core::cuda_backend where alloc or htod_copy called. There is no checks for zero length here, they are assumed to be successful.

I think this function is behaving as it should - it's returning a result (and the unwrap turns it into a panic).

The problem is that this behavior is inconsistent - it seems that on most devices zero allocation succeeds (and candle relies on this) but on GTX 970 it fails.

I'm not really sure what we can do in this case - this seems like a driver level issue. We don't have any device specific code in cudarc, so I guess I'm not sure what the outcome should be. I'm hesitant to use a null pointer (i.e. not actually call cuMalloc) because I don't really know what the downstream effect of that would be or how the cuda driver interacts with all of those.

Can you print out the CudaDevice in your example? I want to see if the is_async is false

let dev = cudarc::driver::CudaDevice::new(0).unwrap();
println!("{:?}", dev);

Can you print out the CudaDevice in your example?

CudaDevice {
    cu_device: 0,
    cu_primary_ctx: 0x000055759b945ec0,
    stream: 0x0000000000000000,
    event: 0x000055759bc8d4f0,
    modules: RwLock {
        data: {},
        poisoned: false,
        ..
    },
    ordinal: 0,
    is_async: false,
}