Scale block not supporting chainrules/Zygote diff yet

Question

Scale block not supporting chainrules/Zygote diff yet

vincentelfving opened this issue 3 years ago · comments

Scale blocks appear to be unsupported by the chainrules in YaoBlocks

A minimal example setting:

using Zygote
using Yao
using YaoBlocks

N=2
psi_0 = zero_state(N)
U0 = chain(N, put(1=>Rx(0.0)), put(2=>Ry(0.0)))
C = 2.1*sum([chain(N, put(k=>Z)) for k=1:N])

function loss(theta)
    U = dispatch(U0, theta)
    psi0 = copy(psi_0)
    psi1 = apply(psi0, U)
    psi2 = apply(psi1, C)
    result = real(sum(conj(state(psi1)) .* state(psi2)))
    return result
end

theta = [1.7,2.5]
println(expect'(C, copy(psi_0) => dispatch(U0, theta))[2])
grad = Zygote.gradient(theta->loss(theta), theta)[1]
println(grad)

the above loss function computes effectively an expectation value equivalent to expect(C, psi_0 => U). Computing expect' is no problem, but when instead we use Zygote we find the following error:

[-2.0824961019501838, -1.2567915026183087]
ERROR: LoadError: UndefKeywordError: keyword argument in not assigned
Stacktrace:
  [1] apply_back!(st::Tuple{ArrayReg{1, ComplexF64, Matrix{ComplexF64}}, ArrayReg{1, ComplexF64, Matrix{ComplexF64}}}, circuit::Add{2}, collector::Vector{Any})
    @ YaoBlocks.AD ~/.julia/packages/YaoBlocks/amVAv/src/autodiff/apply_back.jl:112
  [2] apply_back!(st::Tuple{ArrayReg{1, ComplexF64, Matrix{ComplexF64}}, ArrayReg{1, ComplexF64, Matrix{ComplexF64}}}, block::Scale{Float64, 2, Add{2}}, collector::Vector{Any})
    @ YaoBlocks.AD ~/.julia/packages/YaoBlocks/amVAv/src/autodiff/apply_back.jl:98
  [3] apply_back(st::Tuple{ArrayReg{1, ComplexF64, Matrix{ComplexF64}}, ArrayReg{1, ComplexF64, Matrix{ComplexF64}}}, block::Scale{Float64, 2, Add{2}}; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ YaoBlocks.AD ~/.julia/packages/YaoBlocks/amVAv/src/autodiff/apply_back.jl:151
  [4] apply_back(st::Tuple{ArrayReg{1, ComplexF64, Matrix{ComplexF64}}, ArrayReg{1, ComplexF64, Matrix{ComplexF64}}}, block::Scale{Float64, 2, Add{2}})
    @ YaoBlocks.AD ~/.julia/packages/YaoBlocks/amVAv/src/autodiff/apply_back.jl:150
  [5] (::YaoBlocks.AD.var"#47#48"{Scale{Float64, 2, Add{2}}, ArrayReg{1, ComplexF64, Matrix{ComplexF64}}})(outδ::ArrayReg{1, ComplexF64, Matrix{ComplexF64}})
    @ YaoBlocks.AD ~/.julia/packages/YaoBlocks/amVAv/src/autodiff/chainrules_patch.jl:80
  [6] ZBack
    @ ~/.julia/packages/Zygote/AlLTp/src/compiler/chainrules.jl:204 [inlined]
  [7] Pullback
    @ ~/zygote_scale_bug.jl:14 [inlined]
  [8] (::typeof(∂(loss)))(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/AlLTp/src/compiler/interface2.jl:0
  [9] Pullback
    @ ~/zygote_scale_bug.jl:21 [inlined]
 [10] (::typeof(∂(#35)))(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/AlLTp/src/compiler/interface2.jl:0
 [11] (::Zygote.var"#55#56"{typeof(∂(#35))})(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/AlLTp/src/compiler/interface.jl:41
 [12] gradient(f::Function, args::Vector{Float64})
    @ Zygote ~/.julia/packages/Zygote/AlLTp/src/compiler/interface.jl:76
 [13] top-level scope
    @ ~/zygote_scale_bug.jl:21
 [14] include(fname::String)
    @ Base.MainInclude ./client.jl:444
 [15] top-level scope
    @ REPL[4]:1
 [16] top-level scope
    @ ~/.julia/packages/CUDA/YpW0k/src/initialization.jl:52
zygote_scale_bug.jl:21

However, if we instead put the scale factor in front of each Z instead of in front of the whole sum([chain[][) block, so

C = sum([chain(N, put(k=>2.1*Z)) for k=1:N])

, expect' and zygote.gradient yield the same result [-2.0824961019501838, -1.2567915026183087], as expected.
The two methods are mathematically equivalent, but support for the former would be useful/clean!

Jinguo Liu · Answer 1 · Tue Dec 21 2021 03:54:01 GMT+0800 (China Standard Time)

Thanks for the issue, this is not because the Scale block is not supported, but the Add block is not supported for back-propagating the apply function since it is not reversible. If you do not want to get the gradients of the Hamiltonian, please use Zygote.@ignore to ignore it.

julia> N=2;

julia> psi_0 = zero_state(N);

julia> U0 = chain(N, put(1=>Rx(0.0)), put(2=>Ry(0.0)));

julia> C = 2.1*sum([chain(N, put(k=>Z)) for k=1:N]);

julia> function loss(theta)
           U = dispatch(U0, theta)
           psi0 = copy(psi_0)
           psi1 = apply(psi0, U)
           psi2 = Zygote.@ignore apply(psi1, C)
           result = real(sum(conj(state(psi1)) .* state(psi2)))
           return result
       end

julia> theta = [1.7,2.5];

julia> println(expect'(C, copy(psi_0) => dispatch(U0, theta))[2])
[-2.0824961019501838, -1.2567915026183087]

julia> grad = Zygote.gradient(theta->loss(theta), theta)[1];

julia> println(grad)
[-1.0412480509750919, -0.6283957513091544]

vincentelfving · Answer 2 · Tue Dec 21 2021 16:09:09 GMT+0800 (China Standard Time)

thanks that could work for most cases!
why is C = sum([chain(N, put(k=>2.1*Z)) for k=1:N]) working correctly do you think? that is also an Add block and I use apply on it without Zygote ignore, right?

Also, why is your gradient returned a factor 2 difference in both cases?

Jinguo Liu · Answer 3 · Tue Dec 21 2021 16:52:24 GMT+0800 (China Standard Time)

thanks that could work for most cases! why is C = sum([chain(N, put(k=>2.1*Z)) for k=1:N]) working correctly do you think? that is also an Add block and I use apply on it without Zygote ignore, right?

Also, why is your gradient returned a factor 2 difference in both cases?

If you are asking why expect’ returns the correct gradient, the it is because you are using Yao‘s built in AD engine. Yao ignores it automatically. The reason why the gradients are different by two is probably related to the macro also ignores half of psi‘s gradient at the same time.