how to export the matrix of a circuit on a large system?

Question

how to export the matrix of a circuit on a large system?

yuyuexi opened this issue 2 years ago · comments

Hi, i am using Yao.jl in my recent project, and it is really friendly and efficient.

However, i encounter a problem.

I want to export the time evolution unitary matrix represented by quantum circuit on a large system (around 20 qubits) and find it costs many times.

I know it's an exponential problem. But, since the sparse property of the circuit, i think there might exist some methods to reduce the time cost. I directly use sparse matrix to compute the target matrix and this method is very fast (matrix construction process cannot be realized using quantum circuit). And i also need a circuit to realize this matrix.

Following are some methods i tried:

I try to drop some small elements in the sparse matrix when using mat function. Actually, after every 10 layers, i drop those elements in the current matrix smaller than 1e-8 and observe a little speedup. However, the drop element operation also costs times. So, the actual speedup is not as much as supposed.
I am wondering if i can drop those small elements when doing the sparse matrix multiplication. i try to read the LuxurySparse.jl but do not have a clue yet to do it without huge effort.
I try to simplify the circuit. But, i cannot verify its correctness since the operator_fidelity requiring the matrix and it costs many times.

So, how can i deal with this problem? Many thanks to your suggestions!

Jinguo Liu · Answer 1 · Thu Jun 16 2022 16:13:40 GMT+0800 (China Standard Time)

Sorry I do not fully understand your question.

what do you mean by export? to where?
how time evolution matrix can be sparse? is it diagonal?
since you can construct the matrix explicitly, why not use this matrix directly as the matblock.

julia> matblock(rand_unitary(ComplexF64, 8); tag="my unitary")
my unitary

Wenjie Jiang · Answer 2 · Thu Jun 16 2022 16:27:29 GMT+0800 (China Standard Time)

Sorry I do not fully understand your question.

what do you mean by export? to where?

how time evolution matrix can be sparse? is it diagonal?

since you can construct the matrix explicitly, why not use this matrix directly as the matblock.
julia> matblock(rand_unitary(ComplexF64, 8); tag="my unitary")
my unitary

Sorry, let me explain this more precisely.

i want to calculate the operator_fidelity of the circuit and the unitary i construct by matrix, to make sure that the circuit is correct enough. So, i have to get the matrix of the circuit firstly, is that necessary? (correct me if not)
the time evolution matrix is not diagonal. but the number of the non-zero elements is much smaller than the full matrix size (<< 2^(2N)).
as said above, i want to check if the circuit i use is correct. i need an explicit circuit representation of my evolution unitary, at least for high-enough fidelity.

Jinguo Liu · Answer 3 · Thu Jun 16 2022 16:43:15 GMT+0800 (China Standard Time)

Is it possible to construct a minimum working example? I still do not get your point. Sorry

Wenjie Jiang · Answer 4 · Thu Jun 16 2022 17:06:07 GMT+0800 (China Standard Time)

This is a sample code for toy model.

using Yao
using LinearAlgebra

elementary_matrix = Array{Complex{Float64}, 2}[
    [0 1; 1 0], 
    [0 -1im; 1im 0], 
    [1 0; 0 -1],
    [1 0; 0 1]]

function spin_operator(num_qubits::Int, site::Array, index::Array)
    opt = [1]
    idx = ones(Int, num_qubits) .+ 3
    for i = 1:length(site) 
        idx[site[i]] = index[i]
    end
    for i in 1:num_qubits
        opt = kron(opt, elementary_matrix[idx[i]])
    end

    return opt
end

xx = spin_operator(2, [1,2], [1,1])
mutable struct RXX{T <: Real} <: PrimitiveBlock{2}
    theta::T
end

Yao.mat(::Type{T}, gate::RXX) where T = exp(T(-im * gate.theta/2) * xx)

function U(num_qubits::Int, thetas::Vector)
    h = zeros(ComplexF64, 2^num_qubits, 2^num_qubits)
    for k = 1:num_qubits-1
        h += thetas[k] * spin_operator(num_qubits, [num_qubits-k,num_qubits+1-k], [1,1])
    end
    
    return exp(-im * h / 2)
    
end

function C(num_qubits::Int, thetas::Vector)
    c = chain(num_qubits)
    for k = 1:num_qubits-1
        push!(c.blocks, put(num_qubits, (k,k+1)=>RXX(thetas[k])))
    end
    
    return c
    
end

num_qubits = 4
thetas = rand(num_qubits-1)

u = U(num_qubits, thetas)
c = C(num_qubits, thetas)

@time operator_fidelity(c, u|>matblock)

I want to check if the circuit is correct to represent the unitary.

For this model, the circuit is very short. so the speed is relatively fast. But for other models, the circuit consist of about 100 layers, which slow down the speed.

Jinguo Liu · Answer 5 · Fri Jun 17 2022 03:03:19 GMT+0800 (China Standard Time)

Can you please show me the parameter when it goes slow? This model runs very fast, then I do not see the problem.
One place that you can improve is:

function C(num_qubits::Int, thetas::Vector)
    c = chain(num_qubits)
    for k = 1:num_qubits-1
        push!(c.blocks, put(num_qubits, (k,k+1)=>rot(kron(X, X), thetas[k])))
    end
    return c
end

You can build your circuit with builtin functions, there is no need to define a new block.

Wenjie Jiang · Answer 6 · Sat Jun 18 2022 12:42:04 GMT+0800 (China Standard Time)

Can you please show me the parameter when it goes slow? This model runs very fast, then I do not see the problem. One place that you can improve is:
function C(num_qubits::Int, thetas::Vector)
    c = chain(num_qubits)
    for k = 1:num_qubits-1
        push!(c.blocks, put(num_qubits, (k,k+1)=>rot(kron(X, X), thetas[k])))
    end
    return c
end
You can build your circuit with builtin functions, there is no need to define a new block.

Thanks for the nice suggestion!

Actually the circuit I use in my project is long and most time costs on the mat function, since it has to do many times sparse matrix multiplication (which is different from the toy circuit i post before). Due to the accumulation of the small but nonzero elements, the time required for a single multiplication is growing. For 12 qubits, the mat(circuit) requires about 500s (sorry i cannot provide the actual circuit).

I redefine the prod function in mat method to drop the small elements when multiplying a series of sparse matrices, and observe an obvious speedup (sacrificing some precision).

Following is a performance test.

using Yao
using SparseArrays
using LinearAlgebra

function C(num_qubits::Int, gs::Vector)
    chain(num_qubits, [put(k=>Rx(gs[k])) for k = 1:num_qubits])
end

function U(num_qubits::Int, gs::Vector; tol::Real=1e-5)
    function droptol(m::AbstractMatrix)
        tol > 0 ? droptol!(m, tol) : m
    end
    
    ms = [exp(-im * gs[k] * [0 1; 1 0] / 2)|>sparse|>droptol for k = 1:num_qubits]
    
    m = ms[1]
    for k = 2:num_qubits
        m = kron(ms[k], m)
    end
    
    return m
    
end

function group(ks::AbstractVector, s::Int)
    if length(ks) <= s
        return [ks|>Vector]
    end
    
    return vcat([ks[1:s]|>Vector], group(ks[s+1:end], s))
    
end

function SparseProd(Ms::Vector; tol::Real=1e-15, s::Int=10)
    function droptol(m::AbstractMatrix)
        (tol > 0 && typeof(m) <: SparseArrays.AbstractSparseMatrixCSC) ? droptol!(m, tol) : m
    end
    
    length(Ms) == 0 && throw(ArgumentError("reducing over an empty collection is not allowed"))
    if length(Ms) <= s
        opt = Ms[1]
        for k = 2:length(Ms)
            opt = opt * Ms[k]
        end
        return opt
    end
    
    gs = group(1:length(Ms), s)
    opts = [SparseProd(Ms[gs[k]]; tol=tol, s=s) |> droptol for k = 1:length(gs)]
    
    return SparseProd(opts; tol=tol, s=s)
end



function mymat(c::ChainBlock{N}; kwargs...) where {N}
    if isempty(c.blocks)
        return YaoBlocks.IMatrix{2^N}()
    else
        return SparseProd(mat.(c.blocks[end:-1:1]); kwargs...)
    end
end

Yao.mat(m::AbstractMatrix) = m

function opt_fidelity(a::Union{AbstractMatrix, AbstractBlock}, b::Union{AbstractMatrix, AbstractBlock})
    dim = size(a)[1]
    return abs(sum(conj(mat(a)) .* mat(b)) / dim)
    
end

num_qubits = 14
tol = 1e-4

gs = rand(num_qubits) 
c = C(num_qubits, gs)
@time u = U(num_qubits, gs)
@time @show opt_fidelity(mymat(c; tol=tol, s=3), u1)
@time @show opt_fidelity(mat(c), u)

In my devices, the output is

7.826607 seconds (9.34 M allocations: 8.451 GiB, 12.10% gc time, 40.98% compilation time)
opt_fidelity(mymat(c1; tol = tol, s = 3), u1) = 0.9999981968610407
  9.900545 seconds (2.78 M allocations: 12.732 GiB, 4.89% gc time, 11.72% compilation time)
opt_fidelity(mat(c1), u1) = 0.9999999999999998
 27.617125 seconds (200.82 k allocations: 29.269 GiB, 3.62% gc time, 0.27% compilation time)

If i choose gs around \pi (which gives Rx(g)~X), the time required for mymat method is even less.

At current stage, this modification is enough for my project. And I am still wondering if there is other more elegant way to deal with this?

Jinguo Liu · Answer 7 · Sat Jun 18 2022 13:46:15 GMT+0800 (China Standard Time)

If I understand correctly, you want a chain block that truncate while computing matmul. But I do not think we can truncate small non-zero entries implicitly in Yao. You have to redefining the mat function.
Another place that you can improve is the way you define C is equivalent to the following function

julia> function K(num_qubits::Int, gs::Vector)
           kron(num_qubits, [k=>Rx(gs[k]) for k = 1:num_qubits]...)
       end

julia> kr = K(num_qubits, gs);   # 13 qubits

julia> @time mat(kr);
  0.491260 seconds (191 allocations: 1.333 GiB, 12.91% gc time)

julia> @time mat(c);
  3.112052 seconds (315 allocations: 4.412 GiB, 16.18% gc time)

Wenjie Jiang · Answer 8 · Sat Jun 18 2022 14:46:30 GMT+0800 (China Standard Time)

function K(num_qubits::Int, gs::Vector)
           kron(num_qubits, [k=>Rx(gs[k]) for k = 1:num_qubits]...)
       end

Thanks for your suggestion!

I understand your point.

Many thanks again!