Inference time "PTX compile error: Entry function uses too much parameter space"

Question

Inference time "PTX compile error: Entry function uses too much parameter space"

smart-fr opened this issue a year ago · comments

I successfully trained a NN on my game for 8x8 and 12x12 boards. I am aiming at 16x16, which is the original board dimension.
Inference on a 8x8 board works perfectly, the NN seems to win against any human player, this is fascinating!
Thank you again Jonathan for this generic implementation of AlphaZero.

Now during inference on a 12x12 board, I run into what looks like a CUDA problem. Probably not a bug; I need to allow an "Entry function" to use more parameter space. NB. neither the GPU memory nor the RAM are fully used when this occurs.

Has someone encountered this limitation, and tried to resolve it?

ERROR: Failed to compile PTX code (ptxas exited with code 4294967295)
Invocation arguments: --generate-line-info --verbose --gpu-name sm_86 --output-file C:\Users\smart\AppData\Local\Temp\jl_pK7SBkXwjg.cubin C:\Users\smart\AppData\Local\Temp\jl_rEbwjaXiXu.ptx
ptxas C:\Users\smart\AppData\Local\Temp\jl_rEbwjaXiXu.ptx, line 2027; error   : Entry function '_Z27julia_broadcast_kernel_818015CuKernelContext13CuDeviceArrayI7Float32Li2ELi1EE11BroadcastedI12CuArrayStyleILi2EE5TupleI5OneToI5Int64ES5_IS6_EE2__S4_I8ExtrudedIS0_IS1_Li2ELi1EES4_I4BoolS9_ES4_IS6_S6_EES8_I13ReshapedArrayIS1_Li2E6SArrayIS4_ILi1152EES1_Li1ELi1152EES4_ES4_IS9_S9_ES4_IS6_S6_EEEES6_' uses too much parameter space (0x12b0 bytes, 0x1100 max).
ptxas fatal   : Ptx assembly aborted due to errors
If you think this is a bug, please file an issue and attach C:\Users\smart\AppData\Local\Temp\jl_rEbwjaXiXu.ptx
Stacktrace:
  [1] error(s::String)
    @ Base .\error.jl:35
  [2] cufunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
    @ CUDA C:\Users\smart\.julia\packages\CUDA\Ey3w2\src\compiler\execution.jl:428
  [3] #224
    @ C:\Users\smart\.julia\packages\CUDA\Ey3w2\src\compiler\execution.jl:347 [inlined]       
  [4] JuliaContext(f::CUDA.var"#224#225"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#17", Tuple{CUDA.CuKernelContext, CUDA.CuDeviceMatrix{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(*), Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Extruded{Base.ReshapedArray{Float32, 2, StaticArraysCore.SVector{1152, Float32}, Tuple{}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, Int64}}}})
    @ GPUCompiler C:\Users\smart\.julia\packages\GPUCompiler\qdoh1\src\driver.jl:76
  [5] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA C:\Users\smart\.julia\packages\CUDA\Ey3w2\src\compiler\execution.jl:346
  [6] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler C:\Users\smart\.julia\packages\GPUCompiler\qdoh1\src\cache.jl:90
  [7] cufunction(f::GPUArrays.var"#broadcast_kernel#17", tt::Type{Tuple{CUDA.CuKernelContext, CUDA.CuDeviceMatrix{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(*), Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Extruded{Base.ReshapedArray{Float32, 2, StaticArraysCore.SVector{1152, Float32}, Tuple{}}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, Int64}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA C:\Users\smart\.julia\packages\CUDA\Ey3w2\src\compiler\execution.jl:299
  [8] cufunction
    @ C:\Users\smart\.julia\packages\CUDA\Ey3w2\src\compiler\execution.jl:292 [inlined]       
  [9] macro expansion
    @ C:\Users\smart\.julia\packages\CUDA\Ey3w2\src\compiler\execution.jl:102 [inlined]       
 [10] #launch_heuristic#248
    @ C:\Users\smart\.julia\packages\CUDA\Ey3w2\src\gpuarrays.jl:17 [inlined]
 [11] _copyto!
    @ C:\Users\smart\.julia\packages\GPUArrays\fqD8z\src\host\broadcast.jl:63 [inlined]       
 [12] copyto!
    @ C:\Users\smart\.julia\packages\GPUArrays\fqD8z\src\host\broadcast.jl:46 [inlined]       
 [13] copy
    @ C:\Users\smart\.julia\packages\GPUArrays\fqD8z\src\host\broadcast.jl:37 [inlined]       
 [14] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(*), Tuple{CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Base.ReshapedArray{Float32, 2, StaticArraysCore.SVector{1152, Float32}, Tuple{}}}})
    @ Base.Broadcast .\broadcast.jl:860
 [15] forward_normalized(nn::ResNet, state::CUDA.CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, actions_mask::Base.ReshapedArray{Float32, 2, StaticArraysCore.SVector{1152, Float32}, Tuple{}})
    @ AlphaZero.Network C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\networks\network.jl:265
 [16] evaluate(nn::ResNet, state::NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, Tuple{Int64, Int64}, 144}, UInt8}})
    @ AlphaZero.Network C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\networks\network.jl:292
 [17] AbstractNetwork
    @ C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\networks\network.jl:297 [inlined]    
 [18] state_info(env::AlphaZero.MCTS.Env{NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, Tuple{Int64, Int64}, 144}, UInt8}}, ResNet}, state::NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, Tuple{Int64, Int64}, 144}, UInt8}})
    @ AlphaZero.MCTS C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\mcts.jl:170
 [19] run_simulation!(env::AlphaZero.MCTS.Env{NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, Tuple{Int64, Int64}, 144}, UInt8}}, ResNet}, game::AlphaZero.Examples.BonbonRectangle.GameEnv; η::Vector{Float64}, root::Bool)
    @ AlphaZero.MCTS C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\mcts.jl:206
 [20] explore!(env::AlphaZero.MCTS.Env{NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, Tuple{Int64, Int64}, 144}, UInt8}}, ResNet}, game::AlphaZero.Examples.BonbonRectangle.GameEnv, nsims::Int64)
    @ AlphaZero.MCTS C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\mcts.jl:244
 [21] think(p::MctsPlayer{AlphaZero.MCTS.Env{NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, Tuple{Int64, Int64}, 144}, UInt8}}, ResNet}}, game::AlphaZero.Examples.BonbonRectangle.GameEnv)
    @ AlphaZero C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\play.jl:202
 [22] select_move
    @ C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\play.jl:49 [inlined]
 [23] select_move(p::TwoPlayers{MctsPlayer{AlphaZero.MCTS.Env{NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, Tuple{Int64, Int64}, 144}, UInt8}}, ResNet}}, Human}, game::AlphaZero.Examples.BonbonRectangle.GameEnv, turn::Int64)
    @ AlphaZero C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\play.jl:265
 [24] interactive!(game::AlphaZero.Examples.BonbonRectangle.GameEnv, player::TwoPlayers{MctsPlayer{AlphaZero.MCTS.Env{NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, Tuple{Int64, Int64}, 144}, UInt8}}, ResNet}}, Human})
    @ AlphaZero C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\play.jl:378
 [25] interactive!
    @ C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\play.jl:400 [inlined]
 [26] interactive!(game::AlphaZero.Examples.BonbonRectangle.GameSpec, white::MctsPlayer{AlphaZero.MCTS.Env{NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, UInt8, 144}, StaticArraysCore.SMatrix{12, 12, Tuple{Int64, Int64}, 144}, UInt8}}, ResNet}}, black::Human)
    @ AlphaZero C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\play.jl:402
 [27] play(e::Experiment; args::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ AlphaZero.Scripts C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\scripts\scripts.jl:59
 [28] play
    @ C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\scripts\scripts.jl:39 [inlined]      
 [29] #play#19
    @ C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\scripts\scripts.jl:71 [inlined]      
 [30] play(s::String)
    @ AlphaZero.Scripts C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\scripts\scripts.jl:71
 [31] top-level scope
    @ none:1

Stéphane Martin · Answer 1 · Thu Feb 09 2023 01:51:21 GMT+0800 (China Standard Time)

I could overcome this CUDA inference issue by forcing CPU use during play sessions, in the Arena parameters:

arena = ArenaParams(
  sim=SimParams(
    use_gpu=false,#true,

This is more a workaround than a satisfactory solution, hence I don't close the issue yet.

Maybe my hardware is too limited? RTX3080 Laptop with 16GB Memory.
But I got the same issue using a cloud V100.

(By the way, in order to reuse a session with different parameters for playing than the original ones used for training, I used the trick suggested here: To continue a training #118)

Jonathan Laurent · Answer 2 · Wed Feb 15 2023 18:45:59 GMT+0800 (China Standard Time)

I am really glad to hear that you are starting to see good results on your game!
Your hardware is fine. The RTX3080 is a good GPU and more than I had when I originally developed AlphaZero.jl.

I never encountered the error you reported, although it is probably not AlphaZero.jl-specific.
I would encourage you to look for a minimal nonworking example and submit an issue to CUDA.jl.