Can we build two different `PQC` or `ControlledPQC` layers but sharing same weights (or `trainable_weights`)?

Question

Can we build two different `PQC` or `ControlledPQC` layers but sharing same weights (or `trainable_weights`)?

Shuhul24 opened this issue a year ago · comments

I have built a custom tf.keras.layers.Layer layer having a quantum circuit which is ControlledPQC. But now that I have to use two different models but at some point I want them to share the same trainable weights, i.e., say model H1 have a quantum circuit whose weights gets changed after training (say, for some number of epochs). After a certain number of epochs, I need the same weight values to be for, say, another model H2 which has the same circuit (using ControlledPQC) but needs to have same weight values as that of model H1 quantum circuit. I saw the documentation which consists of tf.keras.layers.Embedding but haven't got any idea how it can be used. Can you help me figure this out?

Owen Lockwood commented a year ago

Yes

Owen Lockwood · Answer 1 · Mon Jan 16 2023 15:55:37 GMT+0800 (China Standard Time)

Should just be a matter of h2.set_weights(h1.get_weights()) whenever you want to do this (in training or whatever epochs or whatnot). I just tested that with H1 being a PQC based model, and H2 with a ControlledPQC custom layer and it worked (although I had to do some reshaping, specifically h2.set_weights(tf.expand_dims(h1.get_weights()[0], axis=0)])). I'm pretty sure they don't have to even be the same circuits at all, but I only tested with same circuits.

Shuhul24 · Answer 2 · Mon Jan 16 2023 16:37:50 GMT+0800 (China Standard Time)

Thanks for the reply!
Actually in my case, I have got 2 circuits attached in one ControlledPQC inside tf.keras.layers.Layer and out of those 2 circuits, one circuit is as ControlledPQC, but the issue is that I can't return the parameters of the same circuit from the combined ControlledPQC as it is a tf.Variable value. Can you help me out on how this can be done?

So its like, circuit H1 -> state -> H2 as ControlledPQC in, say, layer1 (by creating a subclass with tf.keras.layers.Layer module) and circuit H1 -> observable as ControlledPQC in, say, layer2 (similarly built with tf.keras.layers.Layer). Now in this case, layer2 updates similarly as H1 updates itself in layer1. But I can't take the weights from layer1 of H1 and integrate it into layer2. I tried using get_weights(), but I can't slice out of it as weights are tf.Variable.

Owen Lockwood · Answer 3 · Tue Jan 17 2023 02:01:38 GMT+0800 (China Standard Time)

I see, that this was all happening in the custom layer was unclear. Here is an example of what I think you want, which can be achieved by just indexing the weights. I made up circuits H1 and H2 (which are different here but could be the same), then made one controlled PQC for H1 -> H2 and one for H1 -> obs. Then when calling it, you can just take the H1 params you care about for the second PQC. I tested this and it runs, and seems gradable. I verified this, as the gradients for the H2 params are all zero when using the H1 obs.

import tensorflow as tf
import tensorflow_quantum as tfq 
import numpy as np
import cirq
import sympy

class Combined(tf.keras.layers.Layer):
    def __init__(self) -> None:
        super(Combined, self).__init__()
        self.num_params = 4 * 2
        self.qubits = [cirq.GridQubit(0, i) for i in range(4)]
        self.theta = tf.Variable(initial_value=np.random.uniform(0, 2 * np.pi, (1, self.num_params)), dtype="float32", trainable=True)
        self.params = sympy.symbols("params0:%d"%self.num_params)
        self.h1_to_h2_to_obs = tfq.layers.ControlledPQC(self.h1(self.params[:4]) + self.h2(self.params[4:]), [cirq.Z(self.qubits[0])])
        self.h1_to_obs = tfq.layers.ControlledPQC(self.h1(self.params[:4]), [cirq.Z(self.qubits[0])])

    def h1(self, params):
        c = cirq.Circuit()
        for i in range(len(self.qubits)):
          c += cirq.ry(params[i]).on(self.qubits[i])
        return c

    def h2(self, params):
        c = cirq.Circuit()
        for i in range(len(self.qubits)):
          c += cirq.rx(params[i]).on(self.qubits[i])
        return c

    def call(self, inputs):
        h1h2obs = self.h1_to_h2_to_obs([inputs, self.theta])
        h1obs = self.h1_to_obs([inputs, self.theta[:, :4]])
        return h1h2obs, h1obs


l = Combined()
inputs = tfq.convert_to_tensor([cirq.Circuit()])

with tf.GradientTape() as tape:
  value, energy = l(inputs)

grads = tape.gradient(energy, l.trainable_variables)
print(grads)

Output:

[<tf.Tensor: shape=(1, 8), dtype=float32, numpy=
array([[-3.6007202e-01,  4.1406602e-06,  2.5118254e-06,  4.3869954e-06,
         0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00]],
      dtype=float32)>]

Shuhul24 · Answer 4 · Tue Jan 17 2023 14:59:47 GMT+0800 (China Standard Time)

Thanks for the reply!
One last thing. Will the parameters of H1 circuit change simultaneously if I apply gradients on value and energy both? Or will it cause some sort of error? I mean that the parameter values will be same for H1 circuit in both cases when I will use opt.apply_gradient()?

Owen Lockwood · Answer 5 · Tue Jan 17 2023 15:50:16 GMT+0800 (China Standard Time)

If you apply gradients sequentially (i.e. take gradient for value, apply gradient, take gradient for energy, apply it), it will work and H1 params (which are just part of the overall theta Variable) will be changed both times (as both value depend on those parameters). If you mean, your "loss" is actually a combination of those values (not applying the gradients for each), then it will still work but H1 params (well all theta params) will be changes only once. TF autograd will deal with exactly how to compute it. See for example, these two situations in order (same code as above):

l = Combined()
inputs = tfq.convert_to_tensor([cirq.Circuit()])

opt = tf.keras.optimizers.SGD(learning_rate=1.0)

with tf.GradientTape(persistent=True) as tape:
  value, energy = l(inputs)

print("Init", l.trainable_variables)
grads = tape.gradient(energy, l.trainable_variables)
opt.apply_gradients(zip(grads, l.trainable_variables))
print("Params[0] (H1) should change", l.trainable_variables)
grads = tape.gradient(value, l.trainable_variables)
opt.apply_gradients(zip(grads, l.trainable_variables))
print("Params[0] (H1) and Params[4] (H2) should change", l.trainable_variables)

with tf.GradientTape() as tape:
  value, energy = l(inputs)
  loss = value + energy

grads = tape.gradient(loss, l.trainable_variables)
opt.apply_gradients(zip(grads, l.trainable_variables))
print("Both grads", l.trainable_variables)

Results in:

Init [<tf.Variable 'Variable:0' shape=(1, 8) dtype=float32, numpy=
array([[0.25698686, 5.79846   , 4.4762807 , 5.869993  , 2.9522095 ,
        3.7284074 , 4.629468  , 1.5974815 ]], dtype=float32)>]
Params[0] (H1) should change [<tf.Variable 'Variable:0' shape=(1, 8) dtype=float32, numpy=
array([[0.5111524, 5.7984576, 4.4762874, 5.869991 , 2.9522095, 3.7284074,
        4.629468 , 1.5974815]], dtype=float32)>]
Params[0] (H1) and Params[4] (H2) should change [<tf.Variable 'Variable:0' shape=(1, 8) dtype=float32, numpy=
array([[0.03072384, 5.79846   , 4.4762893 , 5.87      , 3.1163952 ,
        3.7284098 , 4.629465  , 1.5974805 ]], dtype=float32)>]
Both grads [<tf.Variable 'Variable:0' shape=(1, 8) dtype=float32, numpy=
array([[0.03073359, 5.79846   , 4.4762893 , 5.87      , 3.1415799 ,
        3.728407  , 4.629471  , 1.597479  ]], dtype=float32)>]

Shuhul24 · Answer 6 · Tue Jan 17 2023 16:53:44 GMT+0800 (China Standard Time)

Thanks!
But how can I extract the parameters of just H1 circuit, because by using get_weights() I am getting the parameters of H1 and H2 circuit combined. Also, the combined circuit parameters are inter-twined (I guess that's because ControlledPQC does it internally). Because if we can extract the trainable_variables of H1 and H2 separately, then I guess using tape.gradient and opt.apply_gradients separately for separate circuits would make sense, right?

Owen Lockwood · Answer 7 · Wed Jan 18 2023 00:57:07 GMT+0800 (China Standard Time)

You can access the H1 params just by indexing the layer weights (as I did in the call function). However, if you are taking the gradient and the function only depends on them (I.e. energy) then the gradient will be 0 elsewhere so you wouldn't have to specifically select for them.

You could extract each by just indexing the trainable variables.

It makes sense that H1 could be separate (but you don't need to index as I mentioned above), but the other function depends on both H1 params and H2 params. You of course could just update the H2 params if that is what you want.

Shuhul24 · Answer 8 · Wed Jan 25 2023 21:37:41 GMT+0800 (China Standard Time)

How I can update the parameters of just H2?

Owen Lockwood · Answer 9 · Thu Jan 26 2023 03:41:39 GMT+0800 (China Standard Time)

Probably easiest to just make the H1 and H2 params separate variables since you are using them a lot independently it seems. The following does that, allowing just params of H2 to be updates easily.

import tensorflow as tf
import tensorflow_quantum as tfq 
import numpy as np
import cirq
import sympy

class Combined(tf.keras.layers.Layer):
    def __init__(self) -> None:
        super(Combined, self).__init__()
        self.num_params = 4 * 2
        self.qubits = [cirq.GridQubit(0, i) for i in range(4)]
        self.h1_weights = tf.Variable(initial_value=np.random.uniform(0, 2 * np.pi, (1, self.num_params // 2)), dtype="float32", trainable=True)
        self.h2_weights = tf.Variable(initial_value=np.random.uniform(0, 2 * np.pi, (1, self.num_params // 2)), dtype="float32", trainable=True)
        self.params = sympy.symbols("params0:%d"%self.num_params)
        self.h1_to_h2_to_obs = tfq.layers.ControlledPQC(self.h1(self.params[:4]) + self.h2(self.params[4:]), [cirq.Z(self.qubits[0])])
        self.h1_to_obs = tfq.layers.ControlledPQC(self.h1(self.params[:4]), [cirq.Z(self.qubits[0])])

    def h1(self, params):
        c = cirq.Circuit()
        for i in range(len(self.qubits)):
          c += cirq.ry(params[i]).on(self.qubits[i])
        return c

    def h2(self, params):
        c = cirq.Circuit()
        for i in range(len(self.qubits)):
          c += cirq.rx(params[i]).on(self.qubits[i])
        return c

    def __call__(self, inputs):
        h1h2obs = self.h1_to_h2_to_obs([inputs, tf.concat([self.h1_weights, self.h2_weights], axis=1)])
        h1obs = self.h1_to_obs([inputs, self.h1_weights])
        return h1h2obs, h1obs


l = Combined()
inputs = tfq.convert_to_tensor([cirq.Circuit()])

opt = tf.keras.optimizers.SGD(learning_rate=1.0)

with tf.GradientTape(persistent=True) as tape:
  value, energy = l(inputs)

print("Init", l.trainable_variables)
grads = tape.gradient(value, [l.trainable_variables[1]])
print(grads)
opt.apply_gradients(zip(grads, [l.trainable_variables[1]]))
print("Params[:4] (H1) Same and Params[4:] (H2) should change", l.trainable_variables)

Result

Init [<tf.Variable 'Variable:0' shape=(1, 4) dtype=float32, numpy=array([[4.926237 , 0.6610151, 4.226024 , 3.861875 ]], dtype=float32)>, <tf.Variable 'Variable:0' shape=(1, 4) dtype=float32, numpy=array([[2.571854  , 0.9393705 , 0.01724625, 1.6942394 ]], dtype=float32)>]
[<tf.Tensor: shape=(1, 4), dtype=float32, numpy=
array([[-1.1447796e-01, -4.6193600e-07, -7.8300945e-07, -9.3504786e-07]],
      dtype=float32)>]
Params[:4] (H1) Same and Params[4:] (H2) should change [<tf.Variable 'Variable:0' shape=(1, 4) dtype=float32, numpy=array([[4.926237 , 0.6610151, 4.226024 , 3.861875 ]], dtype=float32)>, <tf.Variable 'Variable:0' shape=(1, 4) dtype=float32, numpy=array([[2.686332  , 0.939371  , 0.01724703, 1.6942403 ]], dtype=float32)>]

Shuhul24 · Answer 10 · Sat Jan 28 2023 18:44:36 GMT+0800 (China Standard Time)

Thanks for the reply!
So I want to clear out what you just wrote above. So when you doing the tape.gradient with repsect to l.trainable_variables[1], you are basically evaluating gradient of value with respect to just the weights of h2, right?Similarily, for the opt.apply_gradients?

Shuhul24 · Answer 11 · Mon Jan 30 2023 13:44:00 GMT+0800 (China Standard Time)

I wanted to know that is there any difference in the circuit h1 in self.h1h2obs and self.h1obs? I mean, since both of these circuits are working in two different tfq.layers.ControlledPQC, is there any possibility that these two circuits operate differently, say, the output density matrix of self.h1obs (which we can not evaluate) is different from the output density matrix of h1 in self.h1h2obs (also, which we cannot observe)?

Owen Lockwood · Answer 12 · Mon Jan 30 2023 13:52:38 GMT+0800 (China Standard Time)

The same weights and circuits are fed into the controlled PQC so they should be the same in the call function. Since the h1 circuit is pre defined and the same weights are used, I don't think there is a way for them to be different statevectors. They could operate differently, but I am feeding them the same circuit and parameters (for h1), so they shouldn't be different (of course the H2 at the end changes one of them). But if the flow is input -> h1 -> state() -> op and input -> h1 -> state() -> h2 -> op, and h1 is the same structure and params in both, then state is the same in both (assuming simulator, h1 might encounter different noise on a real hardware system on a second run).

Shuhul24 · Answer 13 · Mon Jan 30 2023 13:56:04 GMT+0800 (China Standard Time)

Ok, got it! Can you please suggest a way to evaluate the probability of each state from a tfq.layers.ControlledPQC? I mean like in above example, I am using 2 qubit circuit for h1 and h1h2obs. So I need the porbability of each state (in this case |00>, |01>, |10> and |11>) after h1 and h1h2obs. Is it possible in ControlledPQC or is there any other way you may suggest?

Owen Lockwood · Answer 14 · Mon Jan 30 2023 13:58:27 GMT+0800 (China Standard Time)

Sure, you can just use a state layer (https://www.tensorflow.org/quantum/api_docs/python/tfq/layers/State). Just put in the circuit and parameters and it will give you the state. This isn't from a controlled PQC (since it is tied to observables) but it will generate your state.

Shuhul24 · Answer 15 · Mon Jan 30 2023 14:02:53 GMT+0800 (China Standard Time)

Won't the circuit parameters be fed-up in a different manner for tfq.layers.State as it is distributed in tfq.layers.ControlledPQC? Also, say if I just use tfq.layers.State instead of tfq.layers.ControlledPQC, is the resultant output density matrix differentiable as in the case of above code snippet that you shared where the output is an observable (as you used tfq.layers.ControlledPQC)?

Owen Lockwood · Answer 16 · Mon Jan 30 2023 14:24:28 GMT+0800 (China Standard Time)

They should be distributed the same, since in controlled PQC and in state the param value is connected with a specific param name somewhere in the circuit (which is managed by the layer.

No, state is not differentiable.

Shuhul24 · Answer 17 · Mon Jan 30 2023 14:32:23 GMT+0800 (China Standard Time)

Thanks!
But I am facing an issue of evaluating the probability of each possible states using tfq.layers.State as it evaluates the output density matrix. How to evaluate the probability of each possible states (even I can't find using cirq)? Can you help me figure this out?

Owen Lockwood · Answer 18 · Mon Jan 30 2023 14:40:03 GMT+0800 (China Standard Time)

You want to get the probability from the output of tfq.state? If you are doing noiseless simulation you can just multiply the statevector by it's conjugate and that is the probabilities of each state. For noisy simulation with density matrices, you just apply the measurement matrices in a slightly different way (see: https://www.cs.cmu.edu/~odonnell/quantum15/lecture16.pdf).

Owen Lockwood · Answer 19 · Mon Jan 30 2023 14:41:00 GMT+0800 (China Standard Time)

In cirq, there is a state = cirq.Simulator().simulate(circuit).state_vector() to get the state vector (or maybe that has changed recently, there was a bunch of changes leading up to cirq 1.0)

Shuhul24 · Answer 20 · Mon Jan 30 2023 14:54:25 GMT+0800 (China Standard Time)

Does tfq.layers.State works on |0> state initially, or is there any other way to define these initial states?

Owen Lockwood · Answer 21 · Mon Jan 30 2023 14:56:17 GMT+0800 (China Standard Time)

State accepts any circuit and parameters and generates the state. The initial state could be anything, but you just have to prepend that to the circuit you feed in to it.

Shuhul24 · Answer 22 · Mon Jan 30 2023 15:01:10 GMT+0800 (China Standard Time)

How can I prepend a zero-initialized state |0>?

Owen Lockwood · Answer 23 · Mon Jan 30 2023 15:02:03 GMT+0800 (China Standard Time)

Every circuit starts off in |0>.

Shuhul24 · Answer 24 · Mon Jan 30 2023 15:04:04 GMT+0800 (China Standard Time)

Oh! Thanks!
I am getting trouble to evaluate the conjugate of the output state of tfq.layers.State and the probabilities of each state (by the product of state and its conjugate). Can you share a short snippet how it can be done?

Owen Lockwood · Answer 25 · Mon Jan 30 2023 15:06:55 GMT+0800 (China Standard Time)

There are examples in the docs, see (https://www.tensorflow.org/quantum/tutorials/research_tools):

 final_probs = tf.squeeze(tf.abs(tfq.layers.State()(REFERENCE_CIRCUIT).to_tensor()) ** 2)

Shuhul24 · Answer 26 · Mon Jan 30 2023 15:08:03 GMT+0800 (China Standard Time)

Thanks a lot! It was really helpful.

Shuhul24 · Answer 27 · Mon Jan 30 2023 17:16:45 GMT+0800 (China Standard Time)

I am ending up with None value for the following code below:

import tensorflow as tf
import tensorflow_quantum as tfq
import numpy as np
import sympy
import cirq


class QuantumLayer(tf.keras.layers.Layer):

  def __init__(self) -> None:
    super(QuantumLayer, self).__init__()
    self.qubits = [cirq.GridQubit(1, 0), cirq.GridQubit(1, 1)]
    self.num_params = 2
    self.params = sympy.symbols("params0:%d"%self.num_params)
    self.theta = tf.Variable(initial_value=np.random.uniform(0, 2*np.pi, (1, self.num_params)), dtype="float32", trainable=True)
    self.operation = tfq.layers.State()
  
  def quantum_circ(self, param):
    c = cirq.Circuit()
    for i in range(len(self.qubits)):
      c += cirq.ry(param[i]).on(self.qubits[i])
    return c

  def __call__(self, inputs):
    res = self.operation(self.quantum_circ(self.params), symbol_names=self.params, 
                         symbol_values=self.theta)
    out = tf.squeeze(tf.abs(res.to_tensor() ** 2))

    return out

layer = QuantumLayer()
inputs = tfq.convert_to_tensor([cirq.Circuit()])
with tf.GradientTape() as tape:
  result = layer(inputs)
grad = tape.gradient(result[1], layer.trainable_variables)
print(grad)


>>> [None]

Why is it happening so?

Owen Lockwood · Answer 28 · Tue Jan 31 2023 02:27:07 GMT+0800 (China Standard Time)

State is not differentiable, so no gradients are possible. That specific error is actually because you shouldn't index the result outside of the call (if you move the indexing to the call it can build the AD graph through it). With that you would see

LookupError                               Traceback (most recent call last)
[<ipython-input-8-a097cbd47114>](https://localhost:8080/#) in <module>
     33 with tf.GradientTape() as tape:
     34   result = layer(inputs)
---> 35 grad = tape.gradient(result, layer.trainable_variables)
     36 print(grad)

3 frames
[/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/registry.py](https://localhost:8080/#) in lookup(self, name)
     97       return self._registry[name][_TYPE_TAG]
     98     else:
---> 99       raise LookupError(
    100           "%s registry has no entry for: %s" % (self._name, name))

LookupError: gradient registry has no entry for: TfqSimulateState

Shuhul24 · Answer 29 · Tue Jan 31 2023 02:41:06 GMT+0800 (China Standard Time)

Ok. So is there any alternative to this where I can evaluate the probabilities of the possible states and also evaluate the gradient with respect to them as well? I have extensively used tfq.layers.ControlledPQC, tfq.layers.PQC and tfq.layers.State.

Owen Lockwood · Answer 30 · Tue Jan 31 2023 02:49:55 GMT+0800 (China Standard Time)

There's nothing built in if that is what you mean. You will have to get a little creative lol. Just an idea I had, you could take the Z exp val of each qubit, get the probability of each qubit being in each state, then reconstruct the probability vector manually. E.g. given a two qubit system, suppose the results of Z, Z are [0.5, -0.2], map to probs [(0.75 |0>, 0.25 |1>), (0.4 |0>, 0.6 |1>)], maps to vector [0.3 |00>, 0.45 |01>, 0.1 |10>, 0.15 |11>]. Not 100% this is a universal solution (or that the TF ops to make this are differentiable), but it's an idea.

Shuhul24 · Answer 31 · Tue Jan 31 2023 13:31:40 GMT+0800 (China Standard Time)

I don't quite get how you are mapping the Z, Z results [0.5, -0.2] to probabilities (0.75|0>, 0.25|1>), (0.4|0>, 0.6|1>). I am sorry asking such a dumb question, but can you share some insights into it?

Owen Lockwood · Answer 32 · Tue Jan 31 2023 13:45:56 GMT+0800 (China Standard Time)

Shuhul24 · Answer 33 · Tue Jan 31 2023 14:00:11 GMT+0800 (China Standard Time)

How can I programatically derive the expected value to probability of states?

Owen Lockwood · Answer 34 · Tue Jan 31 2023 14:04:53 GMT+0800 (China Standard Time)

My previous comment outlined how to go from expected value to probabilities. All that needs to be done is just codeify it in TF.

Shuhul24 · Answer 35 · Tue Jan 31 2023 14:21:20 GMT+0800 (China Standard Time)

I tried implementing this, but yet I get the tape.gradient to be None.
I have shared the code below:

class Generator_Discriminator(tf.keras.layers.Layer):
  def __init__(self) -> None:
    super().__init__()
    self.qubits = [cirq.GridQubit(1, 0), cirq.GridQubit(1, 1)]
    self.gen_params = 13
    self.disc_params = 10
    self.params = sympy.symbols("params0:%d"%(self.gen_params+self.disc_params))
    self.disc_weights = tf.Variable(initial_value=np.random.uniform(0, 2*np.pi, (1, self.disc_params)), dtype="float32", trainable=True)
    self.gen_weights = tf.Variable(initial_value=np.random.uniform(0, 2*np.pi, (1, self.gen_params)), dtype="float32", trainable=True)
    self.real_weights = tf.Variable(initial_value=np.random.uniform(0, 2*np.pi, (1, self.gen_params)), dtype="float32", trainable=False)
    self.gentor = tfq.layers.ControlledPQC(self.generator(self.params[:self.gen_params]), [cirq.Z(cirq.GridQubit(1, 0)), cirq.Z(cirq.GridQubit(1, 1))])
    self.gentor_disctor = tfq.layers.ControlledPQC(self.generator(self.params[:self.gen_params])+self.discriminator(self.params[self.gen_params:]),
                                                   [cirq.Z(cirq.GridQubit(1, 0)), cirq.Z(cirq.GridQubit(1, 1))])


  def generator(self, params):
    return ansatz_gen(self.qubits, params)


  def discriminator(self, params):
    return ansatz_disc(self.qubits, params)


  def call(self, inputs):
    # generator_obs = self.gentor(self.generator(self.params[:self.gen_params]), symbol_names=self.params[:self.gen_params], symbol_values=self.gen_weights)
    generator_obs = self.gentor([inputs, self.gen_weights])
    # discriminator_obs = self.gentor(self.generator(self.params[:self.gen_params])+self.discriminator(self.params[self.gen_params:]), 
    #                                 symbol_names=self.params, symbol_values=tf.concat([self.gen_weights, self.disc_weights], axis=1))
    discriminator_obs = self.gentor_disctor([inputs, tf.concat([self.gen_weights, self.disc_weights], axis=1)])
    # discriminator_real = self.gentor(self.generator(self.params[:self.gen_params])+self.discriminator(self.params[self.gen_params:]),
    #                                  symbol_names=self.params, symbol_values = tf.concat([self.gen_weights, self.disc_weights], axis=1))
    discriminator_real = self.gentor_disctor([inputs, tf.concat([self.real_weights, self.disc_weights], axis=1)])

    prob_0_0 = tf.divide(tf.add(tf.squeeze(discriminator_obs)[0], 1), 2)
    prob_1_0 = tf.divide(tf.add(tf.squeeze(discriminator_obs)[1], 1), 2)
    prob_1_1 = 1 - prob_1_0

    res = tf.add(tf.multiply(prob_0_0, prob_1_0), tf.multiply(prob_0_0, prob_1_1))

    
    return res, generator_obs, discriminator_obs, discriminator_real

with tf.GradientTape() as tape:
  out, _, _, _ = generator_discriminator(inputs)
grad = tape.gradient(out, generator_discriminator.trainable_variables[0])
print(grad)

Can you share some insights? Is there some way I can tweak this code into differentiable?

Edit: I observe that when I am applying tape.gradient with respect to generator_discriminator.trainable_variables[0] I get None value, but on the other hand when I am applying the tape.gradient to generator_discriminator.trainable_variables[1], I get some values. Why is it so?

Owen Lockwood · Answer 36 · Tue Jan 31 2023 14:36:54 GMT+0800 (China Standard Time)

This worked for me:

import tensorflow as tf
import tensorflow_quantum as tfq 
import numpy as np
import cirq
import sympy

class Combined(tf.keras.layers.Layer):
    def __init__(self) -> None:
        super(Combined, self).__init__()
        self.num_params = 2
        self.qubits = [cirq.GridQubit(0, i) for i in range(2)]
        self.h1_weights = tf.Variable(initial_value=np.random.uniform(0, 2 * np.pi, (1, self.num_params)), dtype="float32", trainable=True)
        #self.h1_weights = tf.Variable(initial_value=np.zeros(shape=(1, self.num_params)), dtype="float32", trainable=True)
        self.params = sympy.symbols("params0:%d"%self.num_params)
        self.h1_circ = self.h1(self.params)
        self.h1_to_obs = tfq.layers.ControlledPQC(self.h1_circ, [cirq.Z(self.qubits[0]), cirq.Z(self.qubits[1])])

    def h1(self, params):
        c = cirq.Circuit()
        for i in range(len(self.qubits)):
          c += cirq.ry(params[i]).on(self.qubits[i])
        return c

    def __call__(self, inputs):
        h1obs = self.h1_to_obs([inputs, self.h1_weights])
        prob_0_0 = (h1obs[:, :1] + 1) / 2
        prob_1_0 = (h1obs[:, :2] + 1) / 2
        prob_1_1 = 1 - prob_1_0

        res = tf.multiply(prob_0_0, prob_1_0) + tf.multiply(prob_0_0, prob_1_1)
        return h1obs, res


l = Combined()
inputs = tfq.convert_to_tensor([cirq.Circuit()])

opt = tf.keras.optimizers.SGD(learning_rate=1.0)

with tf.GradientTape() as tape:
  _, res= l(inputs)

grad = tape.gradient(res, l.trainable_variables)
print(grad)

As a minimal example.

You encountered the None because the trainable variables are indexed by the order in which they are created and disc params is the first variable created which has no gradients.