Error using KNET
smart-fr opened this issue · comments
In an attempt to resolve a FLUX inference issue I submitted on Issue 171, I tried to train a new NN using the KNET implementation of AlphaZero.Netlib
, and got the following error during the first session of self-play.
Is there anything I should be doing to properly use Knet, else than setting the environment variable ALPHAZERO_DEFAULT_DL_FRAMEWORK
to "KNET"
and pre-compile again?
PS C:\Projets\BonbonRectangle\IA\dev> julia --threads=auto --project -e 'using AlphaZero; Scripts.train("bonbon-rectangle"; save_intermediate=true)'
[ Info: Using the Knet implementation of AlphaZero.NetLib.
[ Info: BonbonRectangle v20230205_KNET_16x16_32_16_no_benchmark
[ Info: params_05.jl
Initializing a new AlphaZero environment
Initial report
Number of network parameters: 19,320,065
Number of regularized network parameters: 19,317,888
Memory footprint per MCTS node: 58456 bytes
Starting iteration 1
Starting self-play
MethodError: no method matching *(::Knet.KnetArrays.Bcasted{CUDA.CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}}, ::Knet.KnetArrays.Bcasted{Knet.KnetArrays.KnetArray{Float32, 4}})
Closest candidates are:
*(::Knet.KnetArrays.Bcasted, ::Knet.KnetArrays.Bcasted) at C:\Users\smart\.julia\packages\Knet\YIFWC\src\knetarrays\binary.jl:142
*(::Any, ::Knet.KnetArrays.Bcasted) at C:\Users\smart\.julia\packages\Knet\YIFWC\src\knetarrays\binary.jl:143
*(::Knet.KnetArrays.Bcasted, ::Any) at C:\Users\smart\.julia\packages\Knet\YIFWC\src\knetarrays\binary.jl:144
...
Stacktrace:
[1] *(x::Knet.KnetArrays.Bcasted{CUDA.CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}}, y::Knet.KnetArrays.Bcasted{Knet.KnetArrays.KnetArray{Float32, 4}})
@ Knet.KnetArrays C:\Users\smart\.julia\packages\Knet\YIFWC\src\knetarrays\binary.jl:142
[2] broadcasted(::Base.Broadcast.Style{Knet.KnetArrays.KnetArray}, ::Function, ::CUDA.CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, ::Knet.KnetArrays.KnetArray{Float32, 4})
@ Knet.KnetArrays C:\Users\smart\.julia\packages\Knet\YIFWC\src\knetarrays\broadcast.jl:10
[3] broadcasted(::Function, ::CUDA.CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, ::Knet.KnetArrays.KnetArray{Float32, 4})
@ Base.Broadcast .\broadcast.jl:1304
[4] _batchnorm4_fused(g::CUDA.CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, b::CUDA.CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, x::Knet.KnetArrays.KnetArray{Float32, 4}; eps::Float64, training::Bool, cache::Knet.Ops20.BNCache, moments::Knet.Ops20.BNMoments, o::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ Knet.Ops20 C:\Users\smart\.julia\packages\Knet\YIFWC\src\ops20\batchnorm.jl:184
[5] #batchnorm4#180
@ C:\Users\smart\.julia\packages\Knet\YIFWC\src\ops20\batchnorm.jl:149 [inlined]
[6] batchnorm(x::Knet.KnetArrays.KnetArray{Float32, 4}, moments::Knet.Ops20.BNMoments, params::AutoGrad.Param{CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}; training::Bool, o::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ Knet.Ops20 C:\Users\smart\.julia\packages\Knet\YIFWC\src\ops20\batchnorm.jl:70
[7] (::AlphaZero.KnetLib.BatchNorm)(x::Knet.KnetArrays.KnetArray{Float32, 4})
@ AlphaZero.KnetLib C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\networks\knet\layers.jl:84
[8] (::AlphaZero.KnetLib.Chain)(x::Knet.KnetArrays.KnetArray{Float32, 4})
@ AlphaZero.KnetLib C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\networks\knet\layers.jl:19
[9] forward(nn::ResNet, state::Knet.KnetArrays.KnetArray{Float32, 4})
@ AlphaZero.KnetLib C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\networks\knet.jl:147
[10] forward_normalized(nn::ResNet, state::Knet.KnetArrays.KnetArray{Float32, 4}, actions_mask::Knet.KnetArrays.KnetMatrix{Float32})
@ AlphaZero.Network C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\networks\network.jl:264
[11] evaluate_batch(nn::ResNet, batch::Vector{NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{16, 16, UInt8, 256}, StaticArraysCore.SMatrix{16, 16, UInt8, 256}, StaticArraysCore.SMatrix{16, 16, Tuple{Int64, Int64}, 256}, UInt8}}})
@ AlphaZero.Network C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\networks\network.jl:312
[12] fill_and_evaluate(net::ResNet, batch::Vector{NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{16, 16, UInt8, 256}, StaticArraysCore.SMatrix{16, 16, UInt8, 256}, StaticArraysCore.SMatrix{16, 16, Tuple{Int64, Int64}, 256}, UInt8}}}; batch_size::Int64, fill_batches::Bool)
@ AlphaZero C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\simulations.jl:32
[13] #36
@ C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\simulations.jl:54 [inlined]
[14] #4
@ C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\batchifier.jl:71 [inlined]
[15] log_event(f::AlphaZero.Batchifier.var"#4#7"{Vector{NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{16, 16, UInt8, 256}, StaticArraysCore.SMatrix{16, 16, UInt8, 256}, StaticArraysCore.SMatrix{16, 16, Tuple{Int64, Int64}, 256}, UInt8}}}, AlphaZero.var"#36#37"{Int64, Bool, ResNet}}; name::String, cat::String, pid::Int64, tid::Int64)
@ AlphaZero.ProfUtils C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\prof_utils.jl:40
[16] macro expansion
@ C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\batchifier.jl:68 [inlined]
[17] macro expansion
@ C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\util.jl:21 [inlined]
[18] (::AlphaZero.Batchifier.var"#2#5"{Int64, AlphaZero.var"#36#37"{Int64, Bool, ResNet}, Channel{Any}})()
@ AlphaZero.Batchifier C:\Users\smart\.julia\packages\ThreadPools\ANo2I\src\macros.jl:261
Unfortunately, I haven't been using the Knet backend for a while and I wouldn't be surprised if it got broken.
As I said in another issue (#166 (comment)), keeping support for both Knet and Flux is a maintenance nightmare and probably not a responsibility AlphaZero.jl should have taken on itself. Knet support may be dropped anytime in the future unless a community-wide API compatibility solution is found so I would not rely on it if I were you.
OK, thank you for the advice!