QuantumBFS / Yao.jl

Extensible, Efficient Quantum Algorithm Design for Humans.

Home Page:https://yaoquantum.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to run a random circuite in a batched way?

zipeilee opened this issue · comments

I wand to run a random unitary circuite in so many instance (like 1000 instance) and return the averge value like :

reg = zero_state(1)
mean([expect(Z, reg |> dispatch!(Rx,:random)) for _ in 1:1000])

but I want run it in a batched way, I know 幺 has batchedarrayreg, but I don't know how to performance random circuit in each instance to a batchedreg in a batched way. I tried

reg = zero_state(1,nbatch=1000)
expect(Z, reg|>dispatch!(Rx,:random))

but it seems not work, it just will pick one random instance 1000 times. What is the correct way?

Unfortunately, there is no easy way to do that. You need to copy reg, because |> changes the state inplace.

julia> sum([expect(Z, copy(reg) |> dispatch!(Rx(0),rand()*2π)) for _ in 1:1000])/1000
0.027340815055604813 + 0.0im

In fact, I hope to use the parallel computing of the GPU for batch processing. But this does not seem to be a good use of the parallel computing of the GPU.

I see. In your case, I would suggest you writing a new kernel, since this features is not supported by Yao yet.

  1. define a new gate type with batched parameters.
  2. dispatch the gate to the correct instruct! function. The current single parameter rotation gate calls into this implementation:
    https://github.com/QuantumBFS/CuYao.jl/blob/05f365f8f8e49fa2787df50a6e2226f508c94d80/src/instructs.jl#L19
    You need to implement a new CUDA kernel (check bellow), it should not be too difficult if you know CUDA programming.

Hint of rewriting this instruct

instruct!(::Val{2}, state::DenseCuVecOrMat, U0::AbstractMatrix, locs::NTuple{M, Int}, clocs::NTuple{C, Int}, cvals::NTuple{C, Int})
  1. The Val{2} means it is for qubit, rather than qudit.
  2. The state is a vector or matrix as the register storage.
  3. U0 is the gate matrix. In your case, you need to input a rank-3 tensor, and each batch stores a 2x2 matrix. In different CUDA thread, you should use different matrix.
  4. locs is the locations that this bit applies on. For single qubit gate, it should only contain one element.
  5. clocs and cvals should be empty tuple in the absense of control bits.

Please feel free to ask if you encounter any issue.

Thanks for your patience and guidance! Actually, I need compute a chain block with such one qubit gates layer and two qubits gate layer in many qubits. You give me a good advice, I will try it.