tensorflow / model-analysis

Model analysis tools for TensorFlow

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TFMA unable to find metrics for Keras model when loading eval result

thisisandreeeee opened this issue · comments

commented

System information

  • Have I written custom code (as opposed to using a stock example script
    provided in TensorFlow Model Analysis)
    : Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS Catalina
  • TensorFlow Model Analysis installed from (source or binary): pypi
  • TensorFlow Model Analysis version (use command below): 0.22.1
  • Python version: 3.7.5
  • Jupyter Notebook version: 1.0.0

Describe the problem

I have trained a Keras model (not estimator) with the following serving signature:

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['examples'] tensor_info:
        dtype: DT_STRING
        shape: (-1)
        name: serving_default_examples:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['mu'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 1)
        name: StatefulPartitionedCall_1:0
    outputs['sigma'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 1)
        name: StatefulPartitionedCall_1:1
  Method name is: tensorflow/serving/predict

The weights are updated using a custom training loop with gradient tape, instead of the model.fit method, before the model is exported as a saved_model. As I am unable to get TFMA to work without first compiling the model, I compile the model while specifying a set of custom Keras metrics:

model.compile(metrics=custom_keras_metrics) # each custom metric inherits from keras.Metric
custom_training_loop(model)
model.save("path/to/saved_model", save_format="tf")

I would like to evaluate this model using TFMA, so I first initialise an eval shared model as follows:

eval_config = tfma.EvalConfig(
    model_specs=[tfma.ModelSpec(label_key="my_label_key")],
    slicing_specs=[tfma.SlicingSpec()] # empty slice refers to the entire dataset
)
eval_shared_model = tfma.default_eval_shared_model("path/to/saved_model", eval_config=eval_config)

However, when I try to run model analysis:

eval_results = tfma.run_model_analysis(
    eval_shared_model=eval_shared_model,
    data_location="path/to/test/tfrecords*",
    file_format="tfrecords"
)

I am faced with the following error:

ValueError          Traceback (most recent call last)
<ipython-input-156-f9a9684a6797> in <module>
      2     eval_shared_model=eval_shared_model,
      3     data_location="tfma/test_raw-*",
----> 4     file_format="tfrecords"
      5 )

~/.pyenv/versions/miniconda3-4.3.30/envs/tensorflow/lib/python3.7/site-packages/tensorflow_model_analysis/api/model_eval_lib.py in run_model_analysis(eval_shared_model, eval_config, data_location, file_format, output_path, extractors, evaluators, writers, pipeline_options, slice_spec, write_config, compute_confidence_intervals, min_slice_size, random_seed_for_testing, schema)
   1204 
   1205   if len(eval_config.model_specs) <= 1:
-> 1206     return load_eval_result(output_path)
   1207   else:
   1208     results = []

~/.pyenv/versions/miniconda3-4.3.30/envs/tensorflow/lib/python3.7/site-packages/tensorflow_model_analysis/api/model_eval_lib.py in load_eval_result(output_path, model_name)
    383       metrics_and_plots_serialization.load_and_deserialize_metrics(
    384           path=os.path.join(output_path, constants.METRICS_KEY),
--> 385           model_name=model_name))
    386   plots_proto_list = (
    387       metrics_and_plots_serialization.load_and_deserialize_plots(

~/.pyenv/versions/miniconda3-4.3.30/envs/tensorflow/lib/python3.7/site-packages/tensorflow_model_analysis/writers/metrics_and_plots_serialization.py in load_and_deserialize_metrics(path, model_name)
    180       raise ValueError('Fail to find metrics for model name: %s . '
    181                        'Available model names are [%s]' %
--> 182                        (model_name, ', '.join(keys)))
    183 
    184     result.append((

ValueError: Fail to find metrics for model name: None . Available model names are []

Why is TFMA raising this exception, and where should I begin debugging this error? I tried specifying the model names manually (which should not be required since I'm only using one model), but that did not seem to help either. I tried tracing the source code and it seems this happens when TFMA tries to load the eval result generated by the PTransform.

Can you try adding a non-custom metric. I suspect that there are no metrics being computed (the poor error message has since been fixed but not sure it is released). Also, can you try load the model with custom metrics and check if the metrics were saved:

model = tf.keras.models.load_model(model_path)
model.metrics

commented

When I do not specify compile=False, the load_model call returns the following error:

ValueError: Unknown metric function: CustomMetric

It seems like this also results in some issues when creating the default eval shared model, which infers that the model type is TF_GENERIC instead of TF_KERAS. I think this might be related to how I am creating the Keras model.

I require a custom training loop using gradient tape with low-level handling of custom metrics. As such, the model does not need to be compiled since the .fit() method is not called. I am able to successfully train, and compute metrics for, the model.

However, when I try passing an uncompiled model, TFMA seems to have difficulty loading it (more details here) with the following exception:

AttributeError: 'NoneType' object has no attribute 'metrics'

Therefore, I tried to compile it while passing the custom metrics:

model.compile(metrics=custom_keras_metrics)

Should I be compiling or saving the model differently in order to ensure compatibility with TFMA?

commented

On a side note, I tried following this example for creating custom metrics. I made no code changes, but even after compiling the model, the metrics attribute seems to be empty.

>>> model.metrics
[]

Is that expected?

What version of TF are you using. There was a bug in some versions of TF where the metrics were not restored on load.

commented

I am using tensorflow 2.3.0. I'm not sure this is related, because the metrics are not available when I call model.metrics immediately after compilation (without saving and loading).

commented

@mdreves Is there an example that runs model analysis for Keras models trained using gradient tape with custom metrics that I could reference?

Not that I'm aware of.

I think the problem is that TF is lazily creating these and doesn't recognized the metrics as part of the model until after model.fit is called which means in your case they will not be saved and they are unknown to TFMA. One option is to manually add your metrics via TFMA config (see [1]). You will need to make sure the lib containing the custom code is available on the workers though.

[1] https://github.com/tensorflow/model-analysis/blob/master/g3doc/metrics.md#customization

commented

Ah, I see. What worked for me was:

First doing a no-op compile:

model.compile(optimizer=my_custom_optimizer) # did not specify loss or metrics here

Then passing the custom metric through the MetricsSpec:

eval_config = tfma.EvalConfig(
    model_specs=[tfma.ModelSpec(label_key="my_label")],
    metrics_specs=[
        tfma.MetricsSpec(
            metrics=[tfma.MetricConfig(
                class_name="MyCustomMetric",
                module="module.containing.metric"
            )]
        )
    ],
    slicing_specs=[
        tfma.SlicingSpec(), # empty slice refers to the entire dataset
    ]
)

I'm running into some other error now, but it doesn't seem to be related to this issue so I'll proceed to resolve this.

Thank you, I appreciate the help!