Inverse Complex-to-Real FFT allocates GPU memory
navdeeprana opened this issue · comments
Navdeep Rana commented
Describe the bug
Inverse Complex-to-Real FFT allocates GPU memory, whereas inverse Complex-to-Complex FFT does not.
To reproduce
The Minimal Working Example (MWE) for this bug:
using AbstractFFTs, CUDA, LinearAlgebra
CUDA.allowscalar(false)
u = CuArray(rand(512,512))
uk = rfft(u)
pfor = plan_rfft(u)
pinv = plan_irfft(uk, 512)
mul!(u, pinv, uk)
println("Complex-to-Real")
CUDA.@time mul!(u, pinv, uk);
u = CuArray(rand(ComplexF64,512,512))
uk = fft(u)
pfor = plan_fft(u)
pinv = plan_ifft(uk)
mul!(u, pinv, uk)
println("Complex-to-Complex")
CUDA.@time mul!(u, pinv, uk);
Complex-to-Real
0.000091 seconds (20 CPU allocations: 800 bytes) (1 GPU allocation: 2.008 MiB, 13.43% memmgmt time)
Complex-to-Complex
0.000168 seconds (132 CPU allocations: 11.141 KiB)
Manifest.toml
CUDA v5.1.2
GPUCompiler v0.25.0
LLVM v6.4.2
Expected behavior
No allocations?
Version info
Details on Julia:
Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 48 × Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, cascadelake)
Threads: 2 on 48 virtual cores
Environment:
JULIA_DEPOT_PATH = /data.lmp/nrana/.julia
JULIA_NUM_THREADS = 1
Details on CUDA:
CUDA runtime 12.3, artifact installation
CUDA driver 12.3
NVIDIA driver 510.108.3, originally for CUDA 11.6
CUDA libraries:
- CUBLAS: 12.3.4
- CURAND: 10.3.4
- CUFFT: 11.0.12
- CUSOLVER: 11.5.4
- CUSPARSE: 12.2.0
- CUPTI: 21.0.0
- NVML: 11.0.0+510.108.3
Julia packages:
- CUDA: 5.1.2
- CUDA_Driver_jll: 0.7.0+1
- CUDA_Runtime_jll: 0.10.1+0
Toolchain:
- Julia: 1.10.0
- LLVM: 15.0.7
4 devices:
0: NVIDIA A100-PCIE-40GB (sm_80, 37.391 GiB / 40.000 GiB available)
1: NVIDIA A100-PCIE-40GB (sm_80, 39.406 GiB / 40.000 GiB available)
2: NVIDIA A100-PCIE-40GB (sm_80, 39.406 GiB / 40.000 GiB available)
3: NVIDIA A100-PCIE-40GB (sm_80, 38.363 GiB / 40.000 GiB available)
Additional context
Add any other context about the problem here.
Tim Besard commented
Known and expected; this is a bug in CUFFT, and NVIDIA has updated the documentation to indicate that these operations are expected to mutate inputs, so we need to take a copy of them.