JuliaStats / LogExpFunctions.jl

Julia package for various special functions based on `log` and `exp`.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

` AbstractIrrational` does not play nice with CUDA

Red-Portal opened this issue · comments

Hi, it seems that many of the functions are not compatible with CUDA.jl out of the box due to dynamic precision (?). Here's a MWE:

LogExpFunctions.log1mexp.(CuVector([-1f0, -2f0, -3f0]))
ERROR: InvalidIRError: compiling MethodInstance for (::GPUArrays.var"#broadcast_kernel#26")(::CUDA.CuKernelContext, ::CuDeviceVector{Float32, 1}, ::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(log1mexp), Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, ::Int64) resulted in invalid LLVM IR
Reason: unsupported dynamic function invocation (call to var"#setprecision#25"(kws::Base.Pairs{Symbol, V, Tuple{Vararg{Symbol, N}}, NamedTuple{names, T}} where {V, N, names, T<:Tuple{Vararg{Any, N}}}, ::typeof(setprecision), f::Function, ::Type{T}, prec::Integer) where T @ Base.MPFR mpfr.jl:969)
Stacktrace:
 [1] setprecision
   @ ./mpfr.jl:969
 [2] Type
   @ ./irrationals.jl:69
 [3] <
   @ ./irrationals.jl:96
 [4] log1mexp
   @ ~/.julia/packages/LogExpFunctions/jq98q/src/basicfuns.jl:234
 [5] _broadcast_getindex_evalf
   @ ./broadcast.jl:683
 [6] _broadcast_getindex
   @ ./broadcast.jl:656
 [7] getindex
   @ ./broadcast.jl:610
 [8] broadcast_kernel
   @ ~/.julia/packages/GPUArrays/5XhED/src/host/broadcast.jl:59
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/validation.jl:149
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:415 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:414 [inlined]
  [5] emit_llvm(job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, only_entry::Bool, validate::Bool)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/utils.jl:89
  [6] emit_llvm
    @ ~/.julia/packages/GPUCompiler/YO8Uj/src/utils.jl:83 [inlined]
  [7] codegen(output::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:129
  [8] codegen
    @ ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:110 [inlined]
  [9] compile(target::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:106
 [10] compile
    @ ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:98 [inlined]
 [11] #1037
    @ ~/.julia/packages/CUDA/tVtYo/src/compiler/compilation.jl:104 [inlined]
 [12] JuliaContext(f::CUDA.var"#1037#1040"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:47
 [13] compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/tVtYo/src/compiler/compilation.jl:103
 [14] actual_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/execution.jl:125
 [15] cached_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/execution.jl:103
 [16] macro expansion
    @ ~/.julia/packages/CUDA/tVtYo/src/compiler/execution.jl:318 [inlined]
 [17] macro expansion
    @ ./lock.jl:267 [inlined]
 [18] cufunction(f::GPUArrays.var"#broadcast_kernel#26", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(log1mexp), Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/tVtYo/src/compiler/execution.jl:313
 [19] cufunction
    @ ~/.julia/packages/CUDA/tVtYo/src/compiler/execution.jl:310 [inlined]
 [20] macro expansion
    @ ~/.julia/packages/CUDA/tVtYo/src/compiler/execution.jl:104 [inlined]
 [21] #launch_heuristic#1080
    @ ~/.julia/packages/CUDA/tVtYo/src/gpuarrays.jl:17 [inlined]
 [22] launch_heuristic
    @ ~/.julia/packages/CUDA/tVtYo/src/gpuarrays.jl:15 [inlined]
 [23] _copyto!
    @ ~/.julia/packages/GPUArrays/5XhED/src/host/broadcast.jl:65 [inlined]
 [24] copyto!
    @ ~/.julia/packages/GPUArrays/5XhED/src/host/broadcast.jl:46 [inlined]
 [25] copy
    @ ~/.julia/packages/GPUArrays/5XhED/src/host/broadcast.jl:37 [inlined]
 [26] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Nothing, typeof(log1mexp), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}})
    @ Base.Broadcast ./broadcast.jl:873
 [27] top-level scope
    @ REPL[22]:1
 [28] top-level scope
    @ ~/.julia/packages/CUDA/tVtYo/src/initialization.j

Simply changing the definition of log1mexp to the following fixes the issue:

log1mexp_cuda(x::T) where {T <: Real} = x < log(T(1)/2) ? log1p(-exp(x)) : log(-expm1(x))
julia> log1mexp_cuda.(CuVector([-1f0, -2f0, -3f0]))
3-element CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}:
 -0.4586752
 -0.14541346
 -0.051069178

Do we really need IrrationalConstants here?

What exactly is the problem here? IrrationalConstants works in exactly the same way as the irrational constants in Base, so I wonder if the same problem can be provoked with e.g. pi instead of IrrationalConstants.loghalf. One advantage of these irrational constants is that they are precomputed for e.g. Float32 and Float64 but allow precise calculations also with other types and functions.

I'm very surprised that CUDA cares about the BigFloat methods if clearly only the Float32 constant is needed. Generally, I'm hesistant to remove IrrationalConstants since it is generally useful and used in Base and throughout the ecosystem, so it seems this problem should be fixed in a different way.

it seems this problem should be fixed in a different way.

Let me try to summon the CUDA experts.

I spoke with Tim Besard; it seems there is no easy way to do this as long as BigFloat is involved. It's because some of the BigFloat conversions call the libmpfr CPU library, which CUDA can't support.

BigFloat should not be involved here - for irrationals in Base and IrrationalConstants, Float32(::MyIrrational) is explicitly defined and set to a constant precomputed value (the same for Float64).

I figured out what's going on: The fallback definitions of the comparison operators (https://github.com/JuliaLang/julia/blob/6e2e6d00258b930f5909d576f2b3510ffa49c4bf/base/irrationals.jl#L96 and surrounding lines) are based not on Float32(x) but Float32(x, RoundDown) - which in contrast to Float32(x) is not defined with a constant but implemented dynamically based on BigFloat (https://github.com/JuliaLang/julia/blob/6e2e6d00258b930f5909d576f2b3510ffa49c4bf/base/irrationals.jl#L68-L72).

I wonder if we should extend the @irrational macros in Base and IrrationalConstants and define Float64(x, RoundDown/RoundUp) and Float32(x, RoundDown/RoundUp) explicitly statically using constants to avoid these dynamic dispatches at least for the common case where the irrational is defined with the macro.

As suspected, the error is not IrrationalConstants specific: For instance,

julia> using CUDA, IrrationalConstants

julia> log1mexp_cuda(x::Real) = twoπ*exp(x) < π ? log1p(-exp(x)) : log(-expm1(x))
log1mexp_cuda (generic function with 1 method)

julia> log1mexp_cuda.(CuVector([-1f0, -2f0, -3f0]))
...

errors as well. I updated the title of the issue to reflect this.

Oh I see! I was scratching my head looking at the Float32(x, RoundDown) and wondering what it should have been. Shouldn't be handled upstream rather than overriding the behavior downstream? I think this issue might pop up in other places that depend on AbstractIrrational too.

Sure, it will be present in basically all code paths that involve comparisons of FloatXX with AbstractIrrationals.

The general issue still exists but should maybe raised upstream. The case in the OP was fixed by #75.

Okay, then I'll close this for now. I'll raise this upstream some time.

One addition: I ran into the same problem starting with Julia 1.9 (1.8 works fine) and opened an issue on the CUDA project which was moved to the GPUCompiler project: JuliaGPU/GPUCompiler.jl#384
Seems that the underlying issue with irrationals is not easy to resolve, so thanks for the effort here!