Error on Android when creating interpreter with "esrgan.tflite with metadata"

Question

Error on Android when creating interpreter with "esrgan.tflite with metadata"

margaretmz opened this issue 4 years ago · comments

Margaret Maynard-Reid commented 4 years ago

@khanhlvg - please see Android error below when creating interpreter with esrgan.tflite with metadata.

Caused by: java.lang.IllegalArgumentException: Internal error: Cannot create interpreter: Didn't find op for builtin opcode 'CONV_2D' version '5'

Notebook for creating tflite model (by @sayakpaul): https://github.com/sayakpaul/Adventures-in-TensorFlow-Lite/blob/master/ESRGAN_TFLite.ipynb
Notebook for adding metadata to tflite model: https://github.com/margaretmz/esrgan-e2e-tflite-tutorial/blob/master/ml/add%20metadata/Add%20metadata%20to%20ESRGAN.ipynb
Android code: https://github.com/margaretmz/esrgan-e2e-tflite-tutorial/tree/master/android

Sayak Paul · Answer 1 · Wed Jul 22 2020 12:10:25 GMT+0800 (China Standard Time)

@margaretmz do you mind adding me as a collaborator to the repo so that I can add in the notebook there? I would do it essentially by creating a PR if you would prefer that.

Khanh · Answer 2 · Thu Jul 23 2020 14:34:51 GMT+0800 (China Standard Time)

@margaretmz The error caused by the fact that the model contains ops that are not supported yet by your TF Lite interpreter. The root cause is because the model converted was converted using tf-nightly build but you are likely using TF Lite 2.2 or an stale version of nightly build.

There are two ways to fix:

Convert the model using TF 2.2. Just remove the step of installing tf-nightly in Sayak's notebook
Or use TFLite nightly in your Android app. Please clear your Gradle cache to make sure that the app is not using a stale version of TF Lite nightly.

dependencies {
    implementation 'org.tensorflow:tensorflow-lite:0.0.0-nightly'
}

Margaret Maynard-Reid · Answer 3 · Thu Jul 23 2020 23:02:25 GMT+0800 (China Standard Time)

@margaretmz do you mind adding me as a collaborator to the repo so that I can add in the notebook there? I would do it essentially by creating a PR if you would prefer that.

Added both you and Khanh as collaborator.

The error caused by the fact that the model contains ops that are not supported yet by your TF Lite interpreter. The root cause is because the model converted was converted using tf-nightly build but you are likely using TF Lite 2.2 or an stale version of nightly build.

I re-created the tflite model to use TF 2.2.0 and the error went away.

Regarding the version mismatch, what's the best way to communicate it to the end-user of the tflite models - Android devs?
It's really annoying if one needs to search for the error solution online. I propose that:

we shouldn't use the tf-nightly for creating tflite models unless we absolutely must.
include the tf & tflite version in the model metadata
document the solution somewhere like on the metadata page about this potential issue.

Sayak Paul · Answer 4 · Thu Jul 23 2020 23:26:02 GMT+0800 (China Standard Time)

@margaretmz the steps you proposed sound good to me except there are actually some models where you need to use tf-nightly. For example, when I was creating the style transfer models I had to use tf-nightly because the stable variant won't support an operation called MirrorPad. Similarly, if we want to use dynamic shapes then also using tf-nightly is a must at least for now (tf 2.3 also supports it).

Margaret Maynard-Reid · Answer 5 · Fri Jul 24 2020 00:32:26 GMT+0800 (China Standard Time)

@sayakpaul, sounds good - understand sometimes we must use tf-hightly in model conversion; and that's where bullet point 2 & 3 above can help communicate it. By default, the tflite version gets added (from model import by ML Model Binding) in Android is the latest stable version. Since there may be some time lag between model conversion and implementation in Android, I think it's a good practice to include the tflite version (used for conversion) in the model metadata.

Sayak Paul · Answer 6 · Fri Jul 24 2020 00:48:29 GMT+0800 (China Standard Time)

I agree. Could you mention a code listing that would allow us to specify the TFLite version in the metadata?

Margaret Maynard-Reid · Answer 7 · Fri Jul 24 2020 01:30:54 GMT+0800 (China Standard Time)

There is currently no dedicated place to put such info in the metadata. So I'd just include it in the description for now. Something like this:
model_meta.description = ("Enhanced super-res GAN for improving image quality. Model converted with TFLiteConverter from TF 2.2.0")

Let's wait to see what @khanhlvg thoughts on this discussion, and whether there should a dedicated field for this.

Sayak Paul · Answer 8 · Fri Jul 24 2020 01:37:34 GMT+0800 (China Standard Time)

Sounds good. I will update the other ones accordingly after this. Only the models that cannot be converted without using tf-nighly would be affected. Shouldn't be a big change.

Margaret Maynard-Reid · Answer 9 · Tue Jul 28 2020 10:58:40 GMT+0800 (China Standard Time)

@khanhlvg I was able to get an output image with the model, but the output has this weird red color. Do I need to do any special post processing?

Sayak Paul · Answer 10 · Tue Jul 28 2020 11:35:29 GMT+0800 (China Standard Time)

@margaretmz did you resize the input image with bicubic interpolation? This is a requirement since the model was trained on bicubically downsampled images. The preprocessing steps can be found in the load_img() function in this notebook. The postprocessing steps are tf.cast(tf.clip_by_value(output, 0, 255), tf.uint8).

Khanh · Answer 11 · Tue Jul 28 2020 11:53:29 GMT+0800 (China Standard Time)

As @sayakpaul point out, the model requires input image to be float in [0.0, 255.0] instead of normalized to [0.0, 1.0]. I think the normalization value in the model metadata could be incorrect, creating the issue with the color.

Margaret Maynard-Reid · Answer 12 · Sat Aug 01 2020 00:05:12 GMT+0800 (China Standard Time)

Yes the normalization range should have been [0.0, 255.0]. I fixed the metadata and now it's working as expected: