tensorflow / model-analysis

Model analysis tools for TensorFlow

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Using keras metrics in TF 1.X

AnbangZhao opened this issue · comments

System information

  • Have I written custom code (as opposed to using a stock example script
    provided in TensorFlow Model Analysis)
    : No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Mac OS X
  • TensorFlow Model Analysis installed from (source or binary): binary
  • TensorFlow Model Analysis version (use command below): 0.21.6
  • Python version: 3.7.4
  • Jupyter Notebook version: N/A
  • Exact command to reproduce: Running keras metrics evaluation with TF 1.X in dataflow distributed mode

Describe the problem

Running
Describe the problem clearly here. Be sure to convey here why it's a bug in
TFMA assumes a V2 execution environment, and to use it in TF 1.X, user needs to do 'tf.compat.v1.enable_v2_behavior()' as instructed by this issue. This works fine in local execution, however, in dataflow distributed mode, it's hard to do this for every worker. There are two places where TF is used in tfma, predict_extractor_v2 and tf_metric_wrapper. Both cases will fail in dataflow. The first one can be solved by injecting tf.compat.v1.enable_v2_behavior() in model_construct_fn, but for the latter one, there's no setup_fn exposed and is hard to work around. This can be reproduced by running TFMA in TF 1.X in cloud dataflow with any keras metrics.

Stack trace:
Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 650, in do_work work_executor.execute() File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 176, in execute op.start() File "dataflow_worker/shuffle_operations.py", line 50, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start File "dataflow_worker/shuffle_operations.py", line 51, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start File "dataflow_worker/shuffle_operations.py", line 66, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start File "dataflow_worker/shuffle_operations.py", line 67, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start File "dataflow_worker/shuffle_operations.py", line 71, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start File "apache_beam/runners/worker/operations.py", line 256, in apache_beam.runners.worker.operations.Operation.output File "apache_beam/runners/worker/operations.py", line 143, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive File "dataflow_worker/shuffle_operations.py", line 234, in dataflow_worker.shuffle_operations.BatchGroupAlsoByWindowsOperation.process File "dataflow_worker/shuffle_operations.py", line 241, in dataflow_worker.shuffle_operations.BatchGroupAlsoByWindowsOperation.process File "apache_beam/runners/worker/operations.py", line 256, in apache_beam.runners.worker.operations.Operation.output File "apache_beam/runners/worker/operations.py", line 143, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive File "apache_beam/runners/worker/operations.py", line 753, in apache_beam.runners.worker.operations.CombineOperation.process File "apache_beam/runners/worker/operations.py", line 758, in apache_beam.runners.worker.operations.CombineOperation.process File "/usr/local/lib/python3.7/site-packages/apache_beam/transforms/combiners.py", line 866, in extract_only return self.combine_fn.extract_output(accumulator) File "/Users/anbang.zhao/repo/masterchef-training/venv/lib/python3.7/site-packages/tensorflow_model_analysis/evaluators/metrics_and_plots_evaluator_v2.py", line 350, in extract_output output = c.extract_output(a) File "/usr/local/lib/python3.7/site-packages/tensorflow_model_analysis/metrics/tf_metric_wrapper.py", line 591, in extract_output result[key] = metric.result().numpy() AttributeError: 'Tensor' object has no attribute 'numpy'

I've tried using --save_main_session option in dataflow and have tf.compat.v1.enable_v2_behavior() in the driver main function, but it does not work.

The general rule for using tf.compat.v1.enable_v2_behavior() is that it should be run from the main of the binary. We could look into adding hooks to calling it from other parts of the lib, but I think this will likely lead to other issues (or fragile code). Let's investigate if there is a Dataflow option first.

@AnbangZhao

Have you tried these approaches using --save_main_session and tf.compat.v1.enable_v2_behavior()? which are mentioned in similar issues, let us know if it helps. Thanks

@AnbangZhao

Closing this issue due to inactivity. Please feel free to reopen if this still exist. Thanks