compute hessian matrix of circuit

Question

compute hessian matrix of circuit

yuyuexi opened this issue 4 years ago · comments

when I used Zygote.hessian to compute the hessian matrix of a circuit, i.e Zygote.hessian(f, params), where params are parameters of a variational circuit and f(params) is a real number, I got a MethodError whose message is below.

It seems that when ForwardDiff use Dual to compute the jacobian matrix, the Dual type is not supported by RotationGate in this circuit. So how can I compute hessian of my real-value function f?

(just for example: f(params) = operator_fidelity(target_unitary, dispatch!(circuit, params)) )

Jinguo Liu · Answer 1 · Wed Dec 02 2020 06:14:21 GMT+0800 (China Standard Time)

Thanks for the issue.
Yao is not compatible with Zygote. You may want to combine Yao's built-in AD engine and Dual numbers to obtain hessians.

julia> using Yao

julia> using ForwardDiff: Dual

julia> reg = ArrayReg(Complex.(Dual.(randn(128), zeros(128)), Dual.(randn(128), zeros(128))))
ArrayReg{1, Complex{Dual{Nothing,Float64,1}}, Array...}
    active qubits: 7/7

julia> c = put(7, 2=>Rx(Dual(2.1, 1.0)))
nqubits: 7
put on (2)
└─ rot(X, Dual{Nothing}(2.1,1.0))

julia> expect'(put(7, 2=>Z), reg=>c)
ArrayReg{1, Complex{Dual{Nothing,Float64,1}}, Array...}
    active qubits: 7/7 => Dual{Nothing,Float64,1}[Dual{Nothing}(38.65132803757106,-11.504413665590928)]

Jinguo Liu · Answer 2 · Wed Dec 02 2020 06:32:51 GMT+0800 (China Standard Time)

Ah, the operator fidelity does not work. But it will be fixed in this PR: QuantumBFS/YaoBlocks.jl#150

Please try

pkg> add YaoBlocks#master

Wenjie Jiang · Answer 3 · Wed Dec 02 2020 16:59:36 GMT+0800 (China Standard Time)

Thanks for the issue.
Yao is not compatible with Zygote. You may want to combine Yao's built-in AD engine and Dual numbers to obtain hessians.

julia> using Yao

julia> using ForwardDiff: Dual

julia> reg = ArrayReg(Complex.(Dual.(randn(128), zeros(128)), Dual.(randn(128), zeros(128))))
ArrayReg{1, Complex{Dual{Nothing,Float64,1}}, Array...}
    active qubits: 7/7

julia> c = put(7, 2=>Rx(Dual(2.1, 1.0)))
nqubits: 7
put on (2)
└─ rot(X, Dual{Nothing}(2.1,1.0))

julia> expect'(put(7, 2=>Z), reg=>c)
ArrayReg{1, Complex{Dual{Nothing,Float64,1}}, Array...}
    active qubits: 7/7 => Dual{Nothing,Float64,1}[Dual{Nothing}(38.65132803757106,-11.504413665590928)]

thanks for the timely reply, I tried your sample code and it runs well.

However, I encountered some problems and still cannot figure out.

firstly, I run circuit with dual parameters and normal parameters perspectively and found their outputs are not the same. More precisely, normal circuit expectation value is just the half of dual circuit expectation. I supposed they should be exactly same except the partial field?

using Yao
using ForwardDiff: Dual

s = randn(4)
reg = ArrayReg(Complex.(Dual.(s, zeros(4)), Dual.(s, zeros(4))))
c = chain(2, put(2=>Rx(Dual(2.1, 1.0))))
println(expect(chain(2, put(2=>Z)), reg=>c))
println(expect'(chain(2, put(2=>Z)), reg=>c))

Dual{Nothing}(-2.9223602277359566,-4.996787532515955) + Dual{Nothing}(0.0,0.0)*im
ArrayReg{1, Complex{Dual{Nothing,Float64,1}}, Array...}
    active qubits: 2/2 => Dual{Nothing,Float64,1}[Dual{Nothing}(-4.996787532515955,2.922360227735957)]

reg = ArrayReg(Complex.(s, zeros(4)))
c = chain(2, put(2=>Rx(2.1)))
println(expect(chain(2, put(2=>Z)), reg=>c))
println(expect'(chain(2, put(2=>Z)), reg=>c))

-1.4611801138679787 + 0.0im
ArrayReg{1, Complex{Float64}, Array...}
    active qubits: 2/2 => [-2.4983937662579776]

secondly, when I replaced Rx gate by Ry gate, a StackOverflowError has been raised. I expected that this change would not influence anything important? or something I did not interpret correctly?

using Yao
using ForwardDiff: Dual

s = randn(4)
reg = ArrayReg(Complex.(Dual.(s, zeros(4)), Dual.(s, zeros(4))))
c = chain(2, put(2=>Ry(Dual(2.1, 1.0))))
println(expect(chain(2, put(2=>Z)), reg=>c))
println(expect'(chain(2, put(1=>Z), put(2=>Z)), reg=>c))

Dual{Nothing}(-5.727229682144747,14.589119935053684) + Dual{Nothing}(0.0,0.0)*im
StackOverflowError:

Stacktrace:
 [1] _cpow(::Complex{Dual{Nothing,Float64,1}}, ::Complex{Dual{Nothing,Float64,1}}) at ./complex.jl:780 (repeats 51934 times)
 [2] ^(::Complex{Dual{Nothing,Float64,1}}, ::Complex{Dual{Nothing,Float64,1}}) at ./complex.jl:781
 [3] ^(::Complex{Dual{Nothing,Float64,1}}, ::Complex{Int64}) at ./promotion.jl:343
 [4] ^(::Complex{Dual{Nothing,Float64,1}}, ::Int64) at ./complex.jl:786
... 
(message omitted)

thirdly, I did not dive very deep into package Yao, so naively I expected this code would return gradient respect to each parameter when there are more than one parameters, which is not the result I found. so is there anything I should notice?

using Yao
using ForwardDiff: Dual

s = randn(4)
reg = ArrayReg(Complex.(Dual.(s, zeros(4)), Dual.(s, zeros(4))))
c = chain(2, put(1=>Rx(Dual(2.1, 1.0))), put(2=>Rx(Dual(2.1, 1.0))))
println(expect(chain(2, put(1=>Z), put(2=>Z)), reg=>c))

Dual{Nothing}(-0.11228066743748855,2.4246744104155162) + Dual{Nothing}(0.0,0.0)*im

again, thanks for your timely reply and hope for next helpful advice.

Jinguo Liu · Answer 4 · Wed Dec 02 2020 22:33:39 GMT+0800 (China Standard Time)

Ah, I should mention the higher level API of ForwardDiff, here is an example of computing the hessian

using ForwardDiff: jacobian, Dual
using Yao
using LinearAlgebra: I

function Base.:(^)(x::Complex{<:Dual}, n::Int)
    y = one(x)
    for i=1:n
        y*=x
    end
    y
end

function compute_gradient(params::AbstractVector{T}) where T
    target = matblock(Matrix{Complex{T}}(I, 1<<5, 1<<5))
    c = chain(5,
        put(5, 2=>Rx(params[1])),
        put(5, 1=>Ry(params[2])),
        put(5, 3=>Rz(params[3])),
        put(5, 2=>shift(params[4]))
        )
    operator_fidelity'(target, c)[2]
end

x = rand(4)*2π
g = compute_gradient(x)
h = jacobian(compute_gradient, x)

The jacobian is a function to compute the jacobian matrix using ForwardDiff.
The functions with prime (') compute parameters gradients. Where parameters in a circuit can be obtained with parameters(c).

About you questions

expect returns the expectation value, expect' (with prime) returns a pair of gradients (d[expectation value]/d[register] => d[expectation value]/d[circuit parameters]). So they are very different.
Nice catch. It should be a bug of ForwardDiff, in the above example, we overwrite the pow function in base in order to make it work. I filed an issue here: JuliaDiff/ForwardDiff.jl#486
This is in fact a question about Dual numbers, it computes d[multiple output]/d[single input], rather than returning gradients. To obtain the hessian, you need to enumerate over inputs, or simply using the above jacobian function (recommended). FYI: check this arxiv paper: https://arxiv.org/abs/1607.07892

Wenjie Jiang · Answer 5 · Thu Dec 03 2020 10:34:07 GMT+0800 (China Standard Time)

Ah, I should mention the higher level API of ForwardDiff, here is an example of computing the hessian
using ForwardDiff: jacobian, Dual
using Yao
using LinearAlgebra: I

function Base.:(^)(x::Complex{<:Dual}, n::Int)
    y = one(x)
    for i=1:n
        y*=x
    end
    y
end

function compute_gradient(params::AbstractVector{T}) where T
    target = matblock(Matrix{Complex{T}}(I, 1<<5, 1<<5))
    c = chain(5,
        put(5, 2=>Rx(params[1])),
        put(5, 1=>Ry(params[2])),
        put(5, 3=>Rz(params[3])),
        put(5, 2=>shift(params[4]))
        )
    operator_fidelity'(target, c)[2]
end

x = rand(4)*2π
g = compute_gradient(x)
h = jacobian(compute_gradient, x)
The jacobian is a function to compute the jacobian matrix using ForwardDiff.
The functions with prime (') compute parameters gradients. Where parameters in a circuit can be obtained with parameters(c).

About you questions

expect returns the expectation value, expect' (with prime) returns a pair of gradients (d[expectation value]/d[register] => d[expectation value]/d[circuit parameters]). So they are very different.

Nice catch. It should be a bug of ForwardDiff, in the above example, we overwrite the pow function in base in order to make it work. I filed an issue here: JuliaDiff/ForwardDiff.jl#486

This is in fact a question about Dual numbers, it computes d[multiple output]/d[single input], rather than returning gradients. To obtain the hessian, you need to enumerate over inputs, or simply using the above jacobian function (recommended). FYI: check this arxiv paper: https://arxiv.org/abs/1607.07892

thanks for your careful explanation. I am afraid of that I might not figure everything out yet.

firstly, I understand that 1) function with prime means its differentiation and 2) Dual number has two fields, i.e. value and partials (as mentioned in arxiv paper you showed ). So as for expect and expect' (with prime) I mentioned before, from my comprehension, the first field of the output of expect with Dual input is the expectation of this circuit which should be the same as the output of expect with normal complex input, and the second field is the differentiation which should be the same as the output of expect'(with prime) with normal complex input. but actually, as showed in my last reply, I found the value field of the output of expect with dual input is the double of expect with normal complex input and similar for the partial field.
thanks for that issue and I will follow that.
I run your sample code, but it gives me some error message which looks like I use Zygote.hessian to compute the hessian of a circuit.

using ForwardDiff: jacobian, Dual
using Yao
using LinearAlgebra: I

function Base.:(^)(x::Complex{<:Dual}, n::Int)
    y = one(x)
    for i=1:n
        y*=x
    end
    y
end

function compute_gradient(params::AbstractVector{T}) where T
    target = matblock(Matrix{Complex{T}}(I, 1<<5, 1<<5))
    c = chain(5,
        put(5, 2=>Rx(params[1])),
        put(5, 1=>Ry(params[2])),
        put(5, 3=>Rz(params[3])),
        put(5, 2=>shift(params[4]))
        )
    operator_fidelity'(target, c)[2]
end

x = rand(4)*2π
g = compute_gradient(x)
h = jacobian(compute_gradient, x)

MethodError: no method matching Float64(::Dual{ForwardDiff.Tag{typeof(compute_gradient),Float64},Float64,4})
Closest candidates are:
  Float64(::Real, !Matched::RoundingMode) where T<:AbstractFloat at rounding.jl:200
  Float64(::T) where T<:Number at boot.jl:715
  Float64(!Matched::Int8) at float.jl:60
  ...

Stacktrace:
 [1] convert(::Type{Float64}, ::Dual{ForwardDiff.Tag{typeof(compute_gradient),Float64},Float64,4}) at ./number.jl:7
 [2] Complex{Float64}(::Dual{ForwardDiff.Tag{typeof(compute_gradient),Float64},Float64,4}, ::Int64) at ./complex.jl:12
 [3] Complex{Float64}(::Dual{ForwardDiff.Tag{typeof(compute_gradient),Float64},Float64,4}) at ./complex.jl:35
 [4] convert(::Type{Complex{Float64}}, ::Dual{ForwardDiff.Tag{typeof(compute_gradient),Float64},Float64,4}) at ./number.jl:7
 [5] setindex! at ./array.jl:828 [inlined]
 [6] hvcat_fill at ./abstractarray.jl:1707 [inlined]
...
(message omitted)

thanks again for your kind reply!

Jinguo Liu · Answer 6 · Thu Dec 03 2020 12:55:37 GMT+0800 (China Standard Time)

It is true that the gradient obtained in Yao and ForwardDiff are different by a factor of 2. This is because they are following different convensions for complex valued gradients. Yao only differentiate either ket or bra. The overall factor is not important in gradient based training.
You need to show this part :D

(message omitted)

BTW: you need to use the master branch of YaoBlocks, otherwise you will see the above error.

Wenjie Jiang · Answer 7 · Thu Dec 03 2020 18:23:31 GMT+0800 (China Standard Time)

It is true that the gradient obtained in Yao and ForwardDiff are different by a factor of 2. This is because they are following different convensions for complex valued gradients. Yao only differentiate either ket or bra. The overall factor is not important in gradient based training.

You need to show this part :D

(message omitted)

BTW: you need to use the master branch of YaoBlocks, otherwise you will see the above error.

Ah, sorry for forgetting to update. Now sample code works well for me.

Thanks again. I believe this actually solves my problem. Means a lot!

Wenjie Jiang · Answer 8 · Thu Dec 03 2020 20:57:49 GMT+0800 (China Standard Time)

It is true that the gradient obtained in Yao and ForwardDiff are different by a factor of 2. This is because they are following different convensions for complex valued gradients. Yao only differentiate either ket or bra. The overall factor is not important in gradient based training.

You need to show this part :D

(message omitted)

BTW: you need to use the master branch of YaoBlocks, otherwise you will see the above error.

Sorry to bother again. Actually I can accomplish my project via discussion above. However, I found a subtle problem based on your sample code and I think this might be some features of Yao package. In order to figure out all details I decide to bother you again.

I noticed that, in your sample code, circuit is defined in function compute_gradient and it works well. For a more general usage, I modified this code and define circuit out of this function. Then I got a MethodError just like I use Zygote.hessian to compute hessian before. Is there anything I did not understand correctly?

using ForwardDiff: jacobian, Dual
using Yao
using LinearAlgebra: I

function Base.:(^)(x::Complex{<:Dual}, n::Int)
    y = one(x)
    for i=1:n
        y*=x
    end
    y
end

function f1(params::AbstractVector{T}) where T
    target = matblock(Matrix{Complex{T}}(I, 1<<5, 1<<5))
    c = chain(5,
        control(5, 1, 2=>Rx(params[1])),
        put(5, 1=>Ry(params[2])),
        put(5, 3=>Rz(params[3])),
        put(5, 2=>shift(params[4]))
        )
    circ = dispatch!(c, params)  # this is used for consistence
    -operator_fidelity'(target, circ)[2]
end

function f2(params::AbstractVector{T}) where T
    target = matblock(Matrix{Complex{T}}(I, 1<<5, 1<<5))
    circ = dispatch!(c1, params)
    -operator_fidelity'(target, circ)[2]
end

x = rand(4)*2π
c1 = chain(5,
    control(5, 1, 2=>Rx(x[1])),
    put(5, 1=>Ry(x[2])),
    put(5, 3=>Rz(x[3])),
    put(5, 2=>shift(x[4]))
    )
println("diff of grad: ")
println(f1(x) - f2(x))
println("jacobian of f1: ")
println(jacobian(f1, x))
println("jacobian of f2: ")
println(jacobian(f2, x))

diff of grad: 
[0.0, 0.0, 0.0, 0.0]
jacobian of f1: 
[0.0010114056391029905 -0.008235282098312148 -0.0002049829250582547 -0.002800577794016422; -0.00823528209831215 0.002129952406713545 -0.015755781525783795 -0.21526325598068313; -0.0002049829250582547 -0.015755781525783802 0.002129952406713597 -0.0053580789755250675; -0.002800577794016422 -0.21526325598068316 -0.005358078975525069 0.002129952406713596]
jacobian of f2: 
MethodError: no method matching Float64(::Dual{ForwardDiff.Tag{typeof(f2),Float64},Float64,4})
Closest candidates are:
  Float64(::Real, !Matched::RoundingMode) where T<:AbstractFloat at rounding.jl:200
  Float64(::T) where T<:Number at boot.jl:715
  Float64(!Matched::Int8) at float.jl:60
  ...

Stacktrace:
 [1] convert(::Type{Float64}, ::Dual{ForwardDiff.Tag{typeof(f2),Float64},Float64,4}) at ./number.jl:7
 [2] setproperty!(::RotationGate{1,Float64,XGate}, ::Symbol, ::Dual{ForwardDiff.Tag{typeof(f2),Float64},Float64,4}) at ./Base.jl:34
...
(message omitted)

Jinguo Liu · Answer 9 · Fri Dec 04 2020 06:29:54 GMT+0800 (China Standard Time)

ForwardDiff can only handle generic code, because it tries to replace numbers with dual types for computing gradients. f2 is not generic.