`merge_call` called while defining a new graph or a tf.function.

Question

`merge_call` called while defining a new graph or a tf.function.

Jark5455 opened this issue 9 months ago · comments

Hello, I am currently trying to create a basic TD3 agent but I am getting this very long error

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/coordinator.py", line 293, in stop_on_exception
    yield
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/distribute/mirrored_run.py", line 387, in run
    self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/agents/tf_agent.py", line 330, in train
    loss_info = self._train_fn(
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/utils/common.py", line 188, in with_check_resource_vars
    return fn(*fn_args, **fn_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/agents/td3/td3_agent.py", line 316, in _train
    tf.cond(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/agents/td3/td3_agent.py", line 311, in optimize_actor
    self._apply_gradients(actor_grads, trainable_actor_variables,
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/agents/td3/td3_agent.py", line 341, in _apply_gradients
    return optimizer.apply_gradients(grads_and_vars)
  File "/usr/local/lib/python3.8/dist-packages/keras/src/optimizers/optimizer.py", line 1229, in apply_gradients
    grads_and_vars = self.aggregate_gradients(grads_and_vars)
  File "/usr/local/lib/python3.8/dist-packages/keras/src/optimizers/optimizer.py", line 1191, in aggregate_gradients
    return optimizer_utils.all_reduce_sum_gradients(grads_and_vars)
  File "/usr/local/lib/python3.8/dist-packages/keras/src/optimizers/utils.py", line 42, in all_reduce_sum_gradients
    reduced = tf.distribute.get_replica_context().merge_call(
RuntimeError: `merge_call` called while defining a new graph or a tf.function. This can often happen if the function `fn` passed to `strategy.run()` contains a nested `@tf.function`, and the nested `@tf.function` contains a synchronization point, such as aggregating gradients (e.g, optimizer.apply_gradients), or if the function `fn` uses a control flow statement which contains a synchronization point in the body. Such behaviors are not yet supported. Instead, please avoid nested `tf.function`s or control flow statements that may potentially cross a synchronization boundary, for example, wrap the `fn` passed to `strategy.run` or the entire `strategy.run` inside a `tf.function` or move the control flow out of `fn`. If you are subclassing a `tf.keras.Model`, please avoid decorating overridden methods `test_step` and `train_step` in `tf.function`.
INFO:tensorflow:Error reported to Coordinator: `merge_call` called while defining a new graph or a tf.function. This can often happen if the function `fn` passed to `strategy.run()` contains a nested `@tf.function`, and the nested `@tf.function` contains a synchronization point, such as aggregating gradients (e.g, optimizer.apply_gradients), or if the function `fn` uses a control flow statement which contains a synchronization point in the body. Such behaviors are not yet supported. Instead, please avoid nested `tf.function`s or control flow statements that may potentially cross a synchronization boundary, for example, wrap the `fn` passed to `strategy.run` or the entire `strategy.run` inside a `tf.function` or move the control flow out of `fn`. If you are subclassing a `tf.keras.Model`, please avoid decorating overridden methods `test_step` and `train_step` in `tf.function`.
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/training/coordinator.py", line 293, in stop_on_exception
    yield
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/distribute/mirrored_run.py", line 387, in run
    self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/agents/tf_agent.py", line 330, in train
    loss_info = self._train_fn(
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/utils/common.py", line 188, in with_check_resource_vars
    return fn(*fn_args, **fn_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/agents/td3/td3_agent.py", line 316, in _train
    tf.cond(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/agents/td3/td3_agent.py", line 311, in optimize_actor
    self._apply_gradients(actor_grads, trainable_actor_variables,
  File "/usr/local/lib/python3.8/dist-packages/tf_agents/agents/td3/td3_agent.py", line 341, in _apply_gradients
    return optimizer.apply_gradients(grads_and_vars)
  File "/usr/local/lib/python3.8/dist-packages/keras/src/optimizers/optimizer.py", line 1229, in apply_gradients
    grads_and_vars = self.aggregate_gradients(grads_and_vars)
  File "/usr/local/lib/python3.8/dist-packages/keras/src/optimizers/optimizer.py", line 1191, in aggregate_gradients
    return optimizer_utils.all_reduce_sum_gradients(grads_and_vars)
  File "/usr/local/lib/python3.8/dist-packages/keras/src/optimizers/utils.py", line 42, in all_reduce_sum_gradients
    reduced = tf.distribute.get_replica_context().merge_call(
RuntimeError: `merge_call` called while defining a new graph or a tf.function. This can often happen if the function `fn` passed to `strategy.run()` contains a nested `@tf.function`, and the nested `@tf.function` contains a synchronization point, such as aggregating gradients (e.g, optimizer.apply_gradients), or if the function `fn` uses a control flow statement which contains a synchronization point in the body. Such behaviors are not yet supported. Instead, please avoid nested `tf.function`s or control flow statements that may potentially cross a synchronization boundary, for example, wrap the `fn` passed to `strategy.run` or the entire `strategy.run` inside a `tf.function` or move the control flow out of `fn`. If you are subclassing a `tf.keras.Model`, please avoid decorating overridden methods `test_step` and `train_step` in `tf.function`.

This error only occurs when I call tf_agents.train.utils.strategy_utils.get_strategy() with gpu=True

This error was triggered when I tried to instantiate an instance of tf_agents.train.learner.Learner. My tensorflow version is 2.13.0 and my tf-agents version is 0.17.0.

The error also appears to be occurring on tensorflow version 2.12.0 and tf-agents version 0.16.0.