JuliaStats / Distributions.jl

The following code to initialize an existing CUDA matrix results in memory corruption (and crashes):
(Note: run in CUDA's compute sanitizer to catch the bug.)

using CUDA
CUDA.run_compute_sanitizer()

using CUDA
using Distributions
using Random

m = CUDA.rand(1024, 1024)
s = Uniform(-1.0, 1.0)
rand!(CUDA.default_rng(), s, m)

It results in a series of messages that include:

========= Invalid __global__ read of size 1 bytes
=========     at _Z16broadcast_kernel15CuKernelContext13CuDeviceArrayI7Float32Li2ELi1EE11BroadcastedI12CuArrayStyleILi2EE5TupleI5OneToI5Int64ES5_IS6_EE8quantileS4_I12CuRefPointerI7UniformI7Float64EE8ExtrudedIS0_IS1_Li2ELi1EES4_I4BoolS12_ES4_IS6_S6_EEEES6_+0x550 in pointer.jl:119
=========     by thread (99,0,0) in block (0,0,0)
=========     Address 0x7f2a88250976 is out of bounds
=========     and is 139,732,598,458,743 bytes after the nearest allocation at 0x1480000000 of size 4,194,304 bytes
=========     Saved host backtrace up to driver entry point at kernel launch time

The problem seems to come from the following:

In materialize!(dest, bc) at broadcast.jl:910
 910  @inline function materialize!(dest, bc::Broadcasted{<:Any})
>911      return materialize!(combine_styles(dest, bc), dest, bc)
 912  end

About to run: <(Base.Broadcast.combine_styles)(Float32[0.64629114 0.78703153 0.52409637 0.6335989 0.8139833 0.772531...>

(which is called by _rand!(rng::AbstractRNG, d::Uniform, A::AbstractArray{<:Real}) in src/univariate/continuous/uniform.jl:156.

The following implementation (which is functionally equivalent to the existing) of _rand! seems to work ok:

function Distributions._rand!(rng::AbstractRNG, d::Uniform, A::AbstractArray{<:Real})
                rand!(rng, A)
                A .*= d.b - d.a
                A .+= d.a
end

Any ideas of what may be going wrong?

Environment

I have tried mostly in Julia 1.10, but have observed the problem in 1.9 (1.9.4 and, I believe, 1.9.3).
I am using Windows and WSL.
I have the following packages:

  [052768ef] CUDA v5.1.1
  [31c24e10] Distributions v0.25.104

The CUDA environment is:

CUDA runtime 12.3, artifact installation
CUDA driver 12.2
NVIDIA driver 550.9.0

CUDA libraries:
- CUBLAS: 12.3.4
- CURAND: 10.3.4
- CUFFT: 11.0.12
- CUSOLVER: 11.5.4
- CUSPARSE: 12.2.0
- CUPTI: 21.0.0
- NVML: 12.0.0+550.9

Julia packages:
- CUDA: 5.1.1
- CUDA_Driver_jll: 0.7.0+0
- CUDA_Runtime_jll: 0.10.1+0

Toolchain:
- Julia: 1.10.0
- LLVM: 15.0.7

1 device:
  0: NVIDIA GeForce RTX 2080 Ti (sm_75, 8.745 GiB / 11.000 GiB available)

I assume it is due to broadcasting over a d::Uniform in

Distributions.jl/src/univariate/continuous/uniform.jl

Line 157 in 9e72f1f

A .= quantile.(d, rand!(rng, A))

. Broadcasting over univariate distributions is supported by

Distributions.jl/src/common.jl

Lines 173 to 175 in 9e72f1f

    
           # allow broadcasting over distribution objects 
        
           # to be decided: how to handle multivariate/matrixvariate distributions? 
        
           Broadcast.broadcastable(d::UnivariateDistribution) = Ref(d)

but the Ref might cause problems with CUDA.

Can you check whether

changing

Distributions.jl/src/common.jl

Lines 173 to 175 in 9e72f1f

    
           # allow broadcasting over distribution objects 
        
           # to be decided: how to handle multivariate/matrixvariate distributions? 
        
           Broadcast.broadcastable(d::UnivariateDistribution) = Ref(d)

to Base.broadcastable(d::UnivariateDistribution) = (d,) fixes the issue?

changing

Distributions.jl/src/univariate/continuous/uniform.jl

Line 157 in 9e72f1f

A .= quantile.(d, rand!(rng, A))

to A .= Base.Fix1(quantile, d).(rand!(rng, A)) fixes the issue?

Thanks a lot David.
Yes, both changes you suggested seem to work well.

The issue in this PR was fixed by #1818 (by avoiding broadcasting over distributions completely in internal code), but of course users might still run into the same issue with code such as (untested since I didn't have access to a GPU):

julia> using Distributions, CUDA

julia> x = CuArray([0.1, 0.5, 0.9]);

julia> x .= quantile.(Uniform(), x)

To me this seems like a bug in CUDA since broadcastable(x::UnivariateDistribution) = Ref(x) is the officially documented way to ensure that x is treated as a scalar in broadcasting: https://docs.julialang.org/en/v1/manual/interfaces/#man-interfaces-broadcasting Have you encountered this issue before @maleadt? Could it be a problem that Uniform() = Uniform(0.0, 1.0) creates a type and operates on Float64? But in that case I don't understand why x .= Base.Fix1(quantile, Uniform()).(x) would work...

I guess this is FluxML/Zygote.jl#1473, which was fixed (by reverting the change) in JuliaGPU/CUDA.jl@a1a72c7. That's part of the latest CUDA.jl release.

@gkantsidis can you confirm that the problem with the MWE in #1815 (comment) was fixed in the latest CUDA release?

Yes, Distributions v0.25.106 works well with CUDA v5.1.2.

And Distributions 0.25.105?

Indeed, with CUDA v5.1.2, Distributions v0.25.105 works correctly.

	# allow broadcasting over distribution objects
	# to be decided: how to handle multivariate/matrixvariate distributions?
	Broadcast.broadcastable(d::UnivariateDistribution) = Ref(d)

rand! with Uniform distribution does not work with CUDA arrays

Environment