rand! with Uniform distribution does not work with CUDA arrays
gkantsidis opened this issue · comments
The following code to initialize an existing CUDA matrix results in memory corruption (and crashes):
(Note: run in CUDA's compute sanitizer to catch the bug.)
using CUDA
CUDA.run_compute_sanitizer()
using CUDA
using Distributions
using Random
m = CUDA.rand(1024, 1024)
s = Uniform(-1.0, 1.0)
rand!(CUDA.default_rng(), s, m)
It results in a series of messages that include:
========= Invalid __global__ read of size 1 bytes
========= at _Z16broadcast_kernel15CuKernelContext13CuDeviceArrayI7Float32Li2ELi1EE11BroadcastedI12CuArrayStyleILi2EE5TupleI5OneToI5Int64ES5_IS6_EE8quantileS4_I12CuRefPointerI7UniformI7Float64EE8ExtrudedIS0_IS1_Li2ELi1EES4_I4BoolS12_ES4_IS6_S6_EEEES6_+0x550 in pointer.jl:119
========= by thread (99,0,0) in block (0,0,0)
========= Address 0x7f2a88250976 is out of bounds
========= and is 139,732,598,458,743 bytes after the nearest allocation at 0x1480000000 of size 4,194,304 bytes
========= Saved host backtrace up to driver entry point at kernel launch time
The problem seems to come from the following:
In materialize!(dest, bc) at broadcast.jl:910
910 @inline function materialize!(dest, bc::Broadcasted{<:Any})
>911 return materialize!(combine_styles(dest, bc), dest, bc)
912 end
About to run: <(Base.Broadcast.combine_styles)(Float32[0.64629114 0.78703153 0.52409637 0.6335989 0.8139833 0.772531...>
(which is called by _rand!(rng::AbstractRNG, d::Uniform, A::AbstractArray{<:Real})
in src/univariate/continuous/uniform.jl:156.
The following implementation (which is functionally equivalent to the existing) of _rand!
seems to work ok:
function Distributions._rand!(rng::AbstractRNG, d::Uniform, A::AbstractArray{<:Real})
rand!(rng, A)
A .*= d.b - d.a
A .+= d.a
end
Any ideas of what may be going wrong?
Environment
I have tried mostly in Julia 1.10, but have observed the problem in 1.9 (1.9.4 and, I believe, 1.9.3).
I am using Windows and WSL.
I have the following packages:
[052768ef] CUDA v5.1.1
[31c24e10] Distributions v0.25.104
The CUDA environment is:
CUDA runtime 12.3, artifact installation
CUDA driver 12.2
NVIDIA driver 550.9.0
CUDA libraries:
- CUBLAS: 12.3.4
- CURAND: 10.3.4
- CUFFT: 11.0.12
- CUSOLVER: 11.5.4
- CUSPARSE: 12.2.0
- CUPTI: 21.0.0
- NVML: 12.0.0+550.9
Julia packages:
- CUDA: 5.1.1
- CUDA_Driver_jll: 0.7.0+0
- CUDA_Runtime_jll: 0.10.1+0
Toolchain:
- Julia: 1.10.0
- LLVM: 15.0.7
1 device:
0: NVIDIA GeForce RTX 2080 Ti (sm_75, 8.745 GiB / 11.000 GiB available)
I assume it is due to broadcasting over a d::Uniform
in
Distributions.jl/src/common.jl
Lines 173 to 175 in 9e72f1f
Ref
might cause problems with CUDA.
Can you check whether
- changing
Distributions.jl/src/common.jl
Lines 173 to 175 in 9e72f1f
Base.broadcastable(d::UnivariateDistribution) = (d,)
fixes the issue? - changing
to
A .= Base.Fix1(quantile, d).(rand!(rng, A))
fixes the issue?
Thanks a lot David.
Yes, both changes you suggested seem to work well.
The issue in this PR was fixed by #1818 (by avoiding broadcasting over distributions completely in internal code), but of course users might still run into the same issue with code such as (untested since I didn't have access to a GPU):
julia> using Distributions, CUDA
julia> x = CuArray([0.1, 0.5, 0.9]);
julia> x .= quantile.(Uniform(), x)
To me this seems like a bug in CUDA since broadcastable(x::UnivariateDistribution) = Ref(x)
is the officially documented way to ensure that x
is treated as a scalar in broadcasting: https://docs.julialang.org/en/v1/manual/interfaces/#man-interfaces-broadcasting Have you encountered this issue before @maleadt? Could it be a problem that Uniform() = Uniform(0.0, 1.0)
creates a type and operates on Float64
? But in that case I don't understand why x .= Base.Fix1(quantile, Uniform()).(x)
would work...
I guess this is FluxML/Zygote.jl#1473, which was fixed (by reverting the change) in JuliaGPU/CUDA.jl@a1a72c7. That's part of the latest CUDA.jl release.
@gkantsidis can you confirm that the problem with the MWE in #1815 (comment) was fixed in the latest CUDA release?
Yes, Distributions v0.25.106 works well with CUDA v5.1.2.
And Distributions 0.25.105?
Indeed, with CUDA v5.1.2, Distributions v0.25.105 works correctly.