TFMA unable to find metrics for Keras model when loading eval result
thisisandreeeee opened this issue · comments
System information
- Have I written custom code (as opposed to using a stock example script
provided in TensorFlow Model Analysis): Yes - OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS Catalina
- TensorFlow Model Analysis installed from (source or binary): pypi
- TensorFlow Model Analysis version (use command below): 0.22.1
- Python version: 3.7.5
- Jupyter Notebook version: 1.0.0
Describe the problem
I have trained a Keras model (not estimator) with the following serving signature:
signature_def['serving_default']:
The given SavedModel SignatureDef contains the following input(s):
inputs['examples'] tensor_info:
dtype: DT_STRING
shape: (-1)
name: serving_default_examples:0
The given SavedModel SignatureDef contains the following output(s):
outputs['mu'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 1)
name: StatefulPartitionedCall_1:0
outputs['sigma'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 1)
name: StatefulPartitionedCall_1:1
Method name is: tensorflow/serving/predict
The weights are updated using a custom training loop with gradient tape, instead of the model.fit
method, before the model is exported as a saved_model
. As I am unable to get TFMA to work without first compiling the model, I compile the model while specifying a set of custom Keras metrics:
model.compile(metrics=custom_keras_metrics) # each custom metric inherits from keras.Metric
custom_training_loop(model)
model.save("path/to/saved_model", save_format="tf")
I would like to evaluate this model using TFMA, so I first initialise an eval shared model as follows:
eval_config = tfma.EvalConfig(
model_specs=[tfma.ModelSpec(label_key="my_label_key")],
slicing_specs=[tfma.SlicingSpec()] # empty slice refers to the entire dataset
)
eval_shared_model = tfma.default_eval_shared_model("path/to/saved_model", eval_config=eval_config)
However, when I try to run model analysis:
eval_results = tfma.run_model_analysis(
eval_shared_model=eval_shared_model,
data_location="path/to/test/tfrecords*",
file_format="tfrecords"
)
I am faced with the following error:
ValueError Traceback (most recent call last)
<ipython-input-156-f9a9684a6797> in <module>
2 eval_shared_model=eval_shared_model,
3 data_location="tfma/test_raw-*",
----> 4 file_format="tfrecords"
5 )
~/.pyenv/versions/miniconda3-4.3.30/envs/tensorflow/lib/python3.7/site-packages/tensorflow_model_analysis/api/model_eval_lib.py in run_model_analysis(eval_shared_model, eval_config, data_location, file_format, output_path, extractors, evaluators, writers, pipeline_options, slice_spec, write_config, compute_confidence_intervals, min_slice_size, random_seed_for_testing, schema)
1204
1205 if len(eval_config.model_specs) <= 1:
-> 1206 return load_eval_result(output_path)
1207 else:
1208 results = []
~/.pyenv/versions/miniconda3-4.3.30/envs/tensorflow/lib/python3.7/site-packages/tensorflow_model_analysis/api/model_eval_lib.py in load_eval_result(output_path, model_name)
383 metrics_and_plots_serialization.load_and_deserialize_metrics(
384 path=os.path.join(output_path, constants.METRICS_KEY),
--> 385 model_name=model_name))
386 plots_proto_list = (
387 metrics_and_plots_serialization.load_and_deserialize_plots(
~/.pyenv/versions/miniconda3-4.3.30/envs/tensorflow/lib/python3.7/site-packages/tensorflow_model_analysis/writers/metrics_and_plots_serialization.py in load_and_deserialize_metrics(path, model_name)
180 raise ValueError('Fail to find metrics for model name: %s . '
181 'Available model names are [%s]' %
--> 182 (model_name, ', '.join(keys)))
183
184 result.append((
ValueError: Fail to find metrics for model name: None . Available model names are []
Why is TFMA raising this exception, and where should I begin debugging this error? I tried specifying the model names manually (which should not be required since I'm only using one model), but that did not seem to help either. I tried tracing the source code and it seems this happens when TFMA tries to load the eval result generated by the PTransform.
Can you try adding a non-custom metric. I suspect that there are no metrics being computed (the poor error message has since been fixed but not sure it is released). Also, can you try load the model with custom metrics and check if the metrics were saved:
model = tf.keras.models.load_model(model_path)
model.metrics
When I do not specify compile=False
, the load_model
call returns the following error:
ValueError: Unknown metric function: CustomMetric
It seems like this also results in some issues when creating the default eval shared model, which infers that the model type is TF_GENERIC
instead of TF_KERAS
. I think this might be related to how I am creating the Keras model.
I require a custom training loop using gradient tape with low-level handling of custom metrics. As such, the model does not need to be compiled since the .fit()
method is not called. I am able to successfully train, and compute metrics for, the model.
However, when I try passing an uncompiled model, TFMA seems to have difficulty loading it (more details here) with the following exception:
AttributeError: 'NoneType' object has no attribute 'metrics'
Therefore, I tried to compile it while passing the custom metrics:
model.compile(metrics=custom_keras_metrics)
Should I be compiling or saving the model differently in order to ensure compatibility with TFMA?
On a side note, I tried following this example for creating custom metrics. I made no code changes, but even after compiling the model, the metrics attribute seems to be empty.
>>> model.metrics
[]
Is that expected?
What version of TF are you using. There was a bug in some versions of TF where the metrics were not restored on load.
I am using tensorflow 2.3.0. I'm not sure this is related, because the metrics are not available when I call model.metrics
immediately after compilation (without saving and loading).
@mdreves Is there an example that runs model analysis for Keras models trained using gradient tape with custom metrics that I could reference?
Not that I'm aware of.
I think the problem is that TF is lazily creating these and doesn't recognized the metrics as part of the model until after model.fit is called which means in your case they will not be saved and they are unknown to TFMA. One option is to manually add your metrics via TFMA config (see [1]). You will need to make sure the lib containing the custom code is available on the workers though.
[1] https://github.com/tensorflow/model-analysis/blob/master/g3doc/metrics.md#customization
Ah, I see. What worked for me was:
First doing a no-op compile:
model.compile(optimizer=my_custom_optimizer) # did not specify loss or metrics here
Then passing the custom metric through the MetricsSpec:
eval_config = tfma.EvalConfig(
model_specs=[tfma.ModelSpec(label_key="my_label")],
metrics_specs=[
tfma.MetricsSpec(
metrics=[tfma.MetricConfig(
class_name="MyCustomMetric",
module="module.containing.metric"
)]
)
],
slicing_specs=[
tfma.SlicingSpec(), # empty slice refers to the entire dataset
]
)
I'm running into some other error now, but it doesn't seem to be related to this issue so I'll proceed to resolve this.
Thank you, I appreciate the help!