leondgarse / Keras_insightface

Insightface Keras implementation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

EfficientNetV2B0 to TFLite

DonkeySmall opened this issue · comments

Hi, first of all, thank you for your excellent work, it's really amazing.

I have a question, whether it is possible to convert the EfficientNetV2B0 model to TFLite, for use with GPU.

P.S. I apologize for my poor English.

The basic conversion is rather easy, just follow TFLite instruction:

mm = keras.models.load_model("{basic_model.h5}")
converter = tf.lite.TFLiteConverter.from_keras_model(mm)
open(mm.name + ".tflite", "wb").write(converter.convert())

But for using TFLite model with GPU, you need to follow TFLite GPU delegate. Never tried that myself.

Yes, that's what I did, but the TfLiteInterpreterOptionsAddDelegate function causes an unknown error, the model runs only under the CPU, it looks like TensorFlow Lite does not support EfficientNetV2B models.

ONNX also does not support GPU for EfficientNetV2B

Uh, so you have already tried those...

  • Do other models like Mobilenet works? Technically EfficientNetV2B0 not using very complicated layers.
  • I've checked Supported ops of TFLite GPU delegate, it seems BatchNormalization is not listed, and may swish also block its usage. May try if a model without BatchNormalization and swish works:
    !pip install keras-cv-attention-models
    
    from keras_cv_attention_models import efficientnet, model_surgery
    mm = efficientnet.EfficientNetV2B0(activation='relu', num_classes=0, pretrained=None)  # Change swish to relu
    mm = model_surgery.convert_to_fused_conv_bn_model(mm)  # Fuse BN with Conv2D
    print([ii for ii in mm.layers if isinstance(ii, keras.layers.BatchNormalization)])  # Check if any BN left
    # []
    
    """ Convert to TFLite model efficientnet_v2-b0.tflite """
    converter = tf.lite.TFLiteConverter.from_keras_model(mm)
    open(mm.name + ".tflite", "wb").write(converter.convert())
    Anyway, I'm not sure if these works.

I haven't tried any of the Mobilenet versions, RestNet 18+ work.

After converting in this way, EfficientNetV2B0 seems to load under GPU

from keras_cv_attention_models import efficientnet, model_surgery
mm = efficientnet.EfficientNetV2B0(activation='relu', num_classes=0, pretrained=None)  # Change swish to relu
mm = model_surgery.convert_to_fused_conv_bn_model(mm)  # Fuse BN with Conv2D
print([ii for ii in mm.layers if isinstance(ii, keras.layers.BatchNormalization)])  # Check if any BN left

""" Convert to TFLite model efficientnet_v2-b0.tflite """
converter = tf.lite.TFLiteConverter.from_keras_model(mm)

converter.target_spec.supported_types = [tf.float16] #you forgot to add

open(mm.name + ".tflite", "wb").write(converter.convert())

If these converting works, you may check which operation is blocking, BN or swish.

  • If BN, need to modify models.py source code first, put the pre_embedding BN layer next to Conv2d:
    $ git diff models.py
    diff --git a/models.py b/models.py
    index 3de2a8b..f1cf6f4 100644
    --- a/models.py
    +++ b/models.py
    @@ -154,6 +154,7 @@ def buildin_models(
             if dropout > 0 and dropout < 1:
                 nn = keras.layers.Dropout(dropout)(nn)
             nn = keras.layers.Conv2D(emb_shape, 1, use_bias=use_bias, kernel_initializer="glorot_normal", name="GDC_conv")(nn)
    +        nn = keras.layers.BatchNormalization(momentum=bn_momentum, epsilon=bn_epsilon, scale=scale, name="pre_embedding")(nn)
             nn = keras.layers.Flatten(name="GDC_flatten")(nn)
             # nn = keras.layers.Dense(emb_shape, activation=None, use_bias=use_bias, kernel_initializer="glorot_normal", name="GDC_dense")(nn)
         elif output_layer == "F":
    @@ -164,8 +165,7 @@ def buildin_models(
             nn = keras.layers.Dense(emb_shape, use_bias=use_bias, kernel_initializer="glorot_normal", name="F_dense")(nn)
    
         # `fix_gamma=True` in MXNet means `scale=False` in Keras
    -    embedding = keras.layers.BatchNormalization(momentum=bn_momentum, epsilon=bn_epsilon, scale=scale, name="pre_embedding")(nn)
    -    embedding_fp32 = keras.layers.Activation("linear", dtype="float32", name="embedding")(embedding)
    +    embedding_fp32 = keras.layers.Activation("linear", dtype="float32", name="embedding")(nn)
    
         basic_model = keras.models.Model(inputs, embedding_fp32, name=xx.name)
         return basic_model
    Then reload model weights
    from keras_cv_attention_models import efficientnet, model_surgery
    import models
    basic_model = efficientnet.EfficientNetV2B0(input_shape=(112, 112, 3), activation="swish", num_classes=0, pretrained=None)
    basic_model = models.buildin_models(basic_model, dropout=0, emb_shape=512, output_layer='GDC', bn_epsilon=1e-4, bn_momentum=0.9, scale=True, use_bias=False)
    
    # Reload weights
    pretrained = 'TT_efv2_b0_swish_GDC_..._0.977333.h5'
    basic_model.load_weights(pretrained)
    
    # Fuse conv bn
    basic_model = model_surgery.convert_to_fused_conv_bn_model(basic_model)
    
    # Accuracy test
    import evals
    test_bin = '/datasets/ms1m-retinaface-t1/agedb_30.bin'
    evals.eval_callback(basic_model, test_bin ).on_epoch_end()
    # >>>> agedb_30 evaluation max accuracy: 0.977333, thresh: 0.193789, previous max accuracy: 0.000000
    
    # Convert TFLite and others
    ...
  • if swish, you have to re-train the model using a supported activation... Relacing swish directly with hard_swish will drop agedb_30 accuracy to 0.915667, and relu directly will be 0.536833.
>>>> len(fuse_convs) = 59 len(fuse_bns) = 59
>>>> Fuse conv bn: stem_conv
>>>> Fuse conv bn: stack_0_block0_fu_conv
>>>> Fuse conv bn: stack_1_block0_sortcut_conv
>>>> Fuse conv bn: stack_1_block0_MB_pw_conv
>>>> Fuse conv bn: stack_1_block1_sortcut_conv
>>>> Fuse conv bn: stack_1_block1_MB_pw_conv
>>>> Fuse conv bn: stack_2_block0_sortcut_conv
>>>> Fuse conv bn: stack_2_block0_MB_pw_conv
>>>> Fuse conv bn: stack_2_block1_sortcut_conv
>>>> Fuse conv bn: stack_2_block1_MB_pw_conv
>>>> Fuse conv bn: stack_3_block0_sortcut_conv
>>>> Fuse conv bn: stack_3_block0_MB_dw_
>>>> Fuse conv bn: stack_3_block0_MB_pw_conv
>>>> Fuse conv bn: stack_3_block1_sortcut_conv
>>>> Fuse conv bn: stack_3_block1_MB_dw_
>>>> Fuse conv bn: stack_3_block1_MB_pw_conv
>>>> Fuse conv bn: stack_3_block2_sortcut_conv
>>>> Fuse conv bn: stack_3_block2_MB_dw_
>>>> Fuse conv bn: stack_3_block2_MB_pw_conv
>>>> Fuse conv bn: stack_4_block0_sortcut_conv
>>>> Fuse conv bn: stack_4_block0_MB_dw_
>>>> Fuse conv bn: stack_4_block0_MB_pw_conv
>>>> Fuse conv bn: stack_4_block1_sortcut_conv
>>>> Fuse conv bn: stack_4_block1_MB_dw_
>>>> Fuse conv bn: stack_4_block1_MB_pw_conv
>>>> Fuse conv bn: stack_4_block2_sortcut_conv
>>>> Fuse conv bn: stack_4_block2_MB_dw_
>>>> Fuse conv bn: stack_4_block2_MB_pw_conv
>>>> Fuse conv bn: stack_4_block3_sortcut_conv
>>>> Fuse conv bn: stack_4_block3_MB_dw_
>>>> Fuse conv bn: stack_4_block3_MB_pw_conv
>>>> Fuse conv bn: stack_4_block4_sortcut_conv
>>>> Fuse conv bn: stack_4_block4_MB_dw_
>>>> Fuse conv bn: stack_4_block4_MB_pw_conv
>>>> Fuse conv bn: stack_5_block0_sortcut_conv
>>>> Fuse conv bn: stack_5_block0_MB_dw_
>>>> Fuse conv bn: stack_5_block0_MB_pw_conv
>>>> Fuse conv bn: stack_5_block1_sortcut_conv
>>>> Fuse conv bn: stack_5_block1_MB_dw_
>>>> Fuse conv bn: stack_5_block1_MB_pw_conv
>>>> Fuse conv bn: stack_5_block2_sortcut_conv
>>>> Fuse conv bn: stack_5_block2_MB_dw_
>>>> Fuse conv bn: stack_5_block2_MB_pw_conv
>>>> Fuse conv bn: stack_5_block3_sortcut_conv
>>>> Fuse conv bn: stack_5_block3_MB_dw_
>>>> Fuse conv bn: stack_5_block3_MB_pw_conv
>>>> Fuse conv bn: stack_5_block4_sortcut_conv
>>>> Fuse conv bn: stack_5_block4_MB_dw_
>>>> Fuse conv bn: stack_5_block4_MB_pw_conv
>>>> Fuse conv bn: stack_5_block5_sortcut_conv
>>>> Fuse conv bn: stack_5_block5_MB_dw_
>>>> Fuse conv bn: stack_5_block5_MB_pw_conv
>>>> Fuse conv bn: stack_5_block6_sortcut_conv
>>>> Fuse conv bn: stack_5_block6_MB_dw_
>>>> Fuse conv bn: stack_5_block6_MB_pw_conv
>>>> Fuse conv bn: stack_5_block7_sortcut_conv
>>>> Fuse conv bn: stack_5_block7_MB_dw_
>>>> Fuse conv bn: stack_5_block7_MB_pw_conv
>>>> Fuse conv bn: post_conv
>>>> Fuse conv bn stem_conv stem_bn
>>>> Fuse conv bn stack_0_block0_fu_conv stack_0_block0_fu_bn
>>>> Fuse conv bn stack_1_block0_sortcut_conv stack_1_block0_sortcut_bn
>>>> Fuse conv bn stack_1_block0_MB_pw_conv stack_1_block0_MB_pw_bn
>>>> Fuse conv bn stack_1_block1_sortcut_conv stack_1_block1_sortcut_bn
>>>> Fuse conv bn stack_1_block1_MB_pw_conv stack_1_block1_MB_pw_bn
>>>> Fuse conv bn stack_2_block0_sortcut_conv stack_2_block0_sortcut_bn
>>>> Fuse conv bn stack_2_block0_MB_pw_conv stack_2_block0_MB_pw_bn
>>>> Fuse conv bn stack_2_block1_sortcut_conv stack_2_block1_sortcut_bn
>>>> Fuse conv bn stack_2_block1_MB_pw_conv stack_2_block1_MB_pw_bn
>>>> Fuse conv bn stack_3_block0_sortcut_conv stack_3_block0_sortcut_bn
>>>> Fuse conv bn stack_3_block0_MB_dw_ stack_3_block0_MB_dw_bn
>>>> Fuse conv bn stack_3_block0_MB_pw_conv stack_3_block0_MB_pw_bn
>>>> Fuse conv bn stack_3_block1_sortcut_conv stack_3_block1_sortcut_bn
>>>> Fuse conv bn stack_3_block1_MB_dw_ stack_3_block1_MB_dw_bn
>>>> Fuse conv bn stack_3_block1_MB_pw_conv stack_3_block1_MB_pw_bn
>>>> Fuse conv bn stack_3_block2_sortcut_conv stack_3_block2_sortcut_bn
>>>> Fuse conv bn stack_3_block2_MB_dw_ stack_3_block2_MB_dw_bn
>>>> Fuse conv bn stack_3_block2_MB_pw_conv stack_3_block2_MB_pw_bn
>>>> Fuse conv bn stack_4_block0_sortcut_conv stack_4_block0_sortcut_bn
>>>> Fuse conv bn stack_4_block0_MB_dw_ stack_4_block0_MB_dw_bn
>>>> Fuse conv bn stack_4_block0_MB_pw_conv stack_4_block0_MB_pw_bn
>>>> Fuse conv bn stack_4_block1_sortcut_conv stack_4_block1_sortcut_bn
>>>> Fuse conv bn stack_4_block1_MB_dw_ stack_4_block1_MB_dw_bn
>>>> Fuse conv bn stack_4_block1_MB_pw_conv stack_4_block1_MB_pw_bn
>>>> Fuse conv bn stack_4_block2_sortcut_conv stack_4_block2_sortcut_bn
>>>> Fuse conv bn stack_4_block2_MB_dw_ stack_4_block2_MB_dw_bn
>>>> Fuse conv bn stack_4_block2_MB_pw_conv stack_4_block2_MB_pw_bn
>>>> Fuse conv bn stack_4_block3_sortcut_conv stack_4_block3_sortcut_bn
>>>> Fuse conv bn stack_4_block3_MB_dw_ stack_4_block3_MB_dw_bn
>>>> Fuse conv bn stack_4_block3_MB_pw_conv stack_4_block3_MB_pw_bn
>>>> Fuse conv bn stack_4_block4_sortcut_conv stack_4_block4_sortcut_bn
>>>> Fuse conv bn stack_4_block4_MB_dw_ stack_4_block4_MB_dw_bn
>>>> Fuse conv bn stack_4_block4_MB_pw_conv stack_4_block4_MB_pw_bn
>>>> Fuse conv bn stack_5_block0_sortcut_conv stack_5_block0_sortcut_bn
>>>> Fuse conv bn stack_5_block0_MB_dw_ stack_5_block0_MB_dw_bn
>>>> Fuse conv bn stack_5_block0_MB_pw_conv stack_5_block0_MB_pw_bn
>>>> Fuse conv bn stack_5_block1_sortcut_conv stack_5_block1_sortcut_bn
>>>> Fuse conv bn stack_5_block1_MB_dw_ stack_5_block1_MB_dw_bn
>>>> Fuse conv bn stack_5_block1_MB_pw_conv stack_5_block1_MB_pw_bn
>>>> Fuse conv bn stack_5_block2_sortcut_conv stack_5_block2_sortcut_bn
>>>> Fuse conv bn stack_5_block2_MB_dw_ stack_5_block2_MB_dw_bn
>>>> Fuse conv bn stack_5_block2_MB_pw_conv stack_5_block2_MB_pw_bn
>>>> Fuse conv bn stack_5_block3_sortcut_conv stack_5_block3_sortcut_bn
>>>> Fuse conv bn stack_5_block3_MB_dw_ stack_5_block3_MB_dw_bn
>>>> Fuse conv bn stack_5_block3_MB_pw_conv stack_5_block3_MB_pw_bn
>>>> Fuse conv bn stack_5_block4_sortcut_conv stack_5_block4_sortcut_bn
>>>> Fuse conv bn stack_5_block4_MB_dw_ stack_5_block4_MB_dw_bn
>>>> Fuse conv bn stack_5_block4_MB_pw_conv stack_5_block4_MB_pw_bn
>>>> Fuse conv bn stack_5_block5_sortcut_conv stack_5_block5_sortcut_bn
>>>> Fuse conv bn stack_5_block5_MB_dw_ stack_5_block5_MB_dw_bn
>>>> Fuse conv bn stack_5_block5_MB_pw_conv stack_5_block5_MB_pw_bn
>>>> Fuse conv bn stack_5_block6_sortcut_conv stack_5_block6_sortcut_bn
>>>> Fuse conv bn stack_5_block6_MB_dw_ stack_5_block6_MB_dw_bn
>>>> Fuse conv bn stack_5_block6_MB_pw_conv stack_5_block6_MB_pw_bn
>>>> Fuse conv bn stack_5_block7_sortcut_conv stack_5_block7_sortcut_bn
>>>> Fuse conv bn stack_5_block7_MB_dw_ stack_5_block7_MB_dw_bn
>>>> Fuse conv bn stack_5_block7_MB_pw_conv stack_5_block7_MB_pw_bn
>>>> Fuse conv bn post_conv post_bn
[]

Wow, everything works. Both for GPU and CPU

THANK YOU VERY MUCH!

image_z_000000
image_z_000001

That's great!