Q-Network wrong output spec
rissois opened this issue · comments
I am receiving the following error: Expected q_network to emit a floating point tensor with inner dims (464,); but saw network output spec: TensorSpec(shape=(6, 4, 464), dtype=tf.float32, name=None)
I am building a custom environment for DqnAgent
with an observation shape of (6,4,4)
. The action is scalar (I would have liked a (2,)
, but apparently that's not possible at the moment. I am following this tutorial as closely as I can for my use case.
The environment class is initialized with:
self._action_spec = array_spec.BoundedArraySpec(
shape=(), dtype=np.int32, minimum=0, maximum=463, name='action'
)
# Six 4x4 boards
self._observation_spec = array_spec.BoundedArraySpec(
(6, 4, 4), np.int32,
minimum=self.createMinMaxBoards([0, 0, 0, 0, 0, -1]),
maximum=self.createMinMaxBoards([1, 1, 1, 1, 3, 2]),
)
I was able to successfully validate the environment and run the environment with a fixed policy, as per the tutorial, so the environment itself is in good shape. I then jumped over to this tutorial to add the agent and copy and pasted those two blocks of code directly:
fc_layer_params = (100, 50)
action_tensor_spec = tensor_spec.from_spec(env.action_spec())
num_actions = action_tensor_spec.maximum - action_tensor_spec.minimum + 1
# Define a helper function to create Dense layers configured with the right
# activation and kernel initializer.
def dense_layer(num_units):
return tf.keras.layers.Dense(
num_units,
activation=tf.keras.activations.relu,
kernel_initializer=tf.keras.initializers.VarianceScaling(
scale=2.0, mode='fan_in', distribution='truncated_normal'))
# QNetwork consists of a sequence of Dense layers followed by a dense layer
# with `num_actions` units to generate one q_value per available action as
# its output.
dense_layers = [dense_layer(num_units) for num_units in fc_layer_params]
q_values_layer = tf.keras.layers.Dense(
num_actions,
activation=None,
kernel_initializer=tf.keras.initializers.RandomUniform(
minval=-0.03, maxval=0.03),
bias_initializer=tf.keras.initializers.Constant(-0.2))
q_net = sequential.Sequential(dense_layers + [q_values_layer])
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
train_step_counter = tf.Variable(0)
agent = dqn_agent.DqnAgent(
train_env.time_step_spec(),
train_env.action_spec(),
q_network=q_net,
optimizer=optimizer,
td_errors_loss_fn=common.element_wise_squared_loss,
train_step_counter=train_step_counter)
agent.initialize()
The error is thrown at agent = dqn_agent.DqnAgent(...)
. There is a line in dqn_agent.py: q_network.create_variables(net_observation_spec)
which seems to create the (6,4,464) shape. I would have imagined the network output would automatically be adopted from q_values_layer
num_actions
. More then likely this is a failure on my end, but I have seen unresolved posts on StackOverflow. Can anyone please help correct my understanding / code here?
Even I'm facing the same issue. Have you resolved the issue?