Provided weight data has no target variable: batch_normalization

Question

Provided weight data has no target variable: batch_normalization

rajeev-samalkha opened this issue 6 years ago · comments

rajeev-samalkha commented 6 years ago

To get help from the community, check out our Google group.

TensorFlow.js version

0.13

Browser version

Chrome Version 69.0.3497.100

Describe the problem or feature request

I converted a Keras model to tfjs using python utility with no errors. But when I try to load the model in tfjs, I get the following error:

tfjs@0.13.0:2 Uncaught (in promise) Error: Provided weight data has no target variable: batch_normalization_1_2/gamma
    at new t (tfjs@0.13.0:2)
    at loadWeightsFromNamedTensorMap (tfjs@0.13.0:2)
    at t.loadWeights (tfjs@0.13.0:2)
    at tfjs@0.13.0:2
    at tfjs@0.13.0:2
    at Object.next (tfjs@0.13.0:2)
    at i (tfjs@0.13.0:2)

Code to reproduce the bug / link to feature request

Running it on local machine.
model = await tf.loadModel(<path_to_model.json>)

Stanley Bileschi · Answer 1 · Thu Oct 04 2018 21:37:29 GMT+0800 (China Standard Time)

Hi
Can you please share your (original) model and the commands used to convert & load?
Thanks

David Soergel · Answer 2 · Fri Oct 05 2018 23:25:31 GMT+0800 (China Standard Time)

If that weight is an extra one that is lying around for some reason but is not actually needed, you can call tf.loadModel(..., strict=false) to disable the error.

Of course, if the weight is needed, doing this would leave you with a broken model. In that case, as @bileschi said, we'd need to see the original Keras model to determine whether there is a conversion bug.

Stanley Bileschi · Answer 3 · Tue Oct 09 2018 23:00:53 GMT+0800 (China Standard Time)

@rajeev-samalkha is this issue resolved? If so feel free to close. Thankyou.

rajeev-samalkha · Answer 4 · Wed Oct 10 2018 19:38:21 GMT+0800 (China Standard Time)

Folks, sorry for delayed response.

When I take out Batch Norm layer then it seems to work fine (I had to retrain the model). Is there any difference between Batch norm in tf.keras and tfjs. I have used tensorflowjs.converter utility. The model in itself is quite simple with few Con2D layers interspersed by Max Pool/Dropout.

Regards
Rajeev

Stanley Bileschi · Answer 5 · Wed Oct 10 2018 21:54:32 GMT+0800 (China Standard Time)

TFJS BatchNorm maintains 4 up to weights:
gamma, beta, movingMean, and movingVariance which matches those in keras-team/keras

https://github.com/keras-team/keras/blob/master/keras/layers/normalization.py#L93

I wonder if tf.keras is saving additional tensors, possibly for optimization or related to the momentum for training?

Looking at the tensorflow/keras implementation, I see that it is somewhat more complex: including a 'fused' batch-norm implementation that reaches into c.
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/layers/normalization.py

Can you list out the weights in your model from the python code?

rajeev-samalkha · Answer 6 · Wed Oct 10 2018 22:51:21 GMT+0800 (China Standard Time)

Do you need weights for all the layers or just BatchNorm one.

Stanley Bileschi · Answer 7 · Wed Oct 10 2018 23:17:03 GMT+0800 (China Standard Time)

Just the names and sizes of the batch_norm related weights should be fine. No need to provide the actual values.

…

On Wed, Oct 10, 2018 at 10:51 AM rajeev-samalkha ***@***.***> wrote: Do you need weights for all the layers or just BatchNorm one. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#755 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAhZTn53dfvl1D88ywKN59zRuoDWERpgks5ujglrgaJpZM4XFl_7> .

-- Stan Bileschi Ph.D. | SWE | bileschi@google.com | 617-230-8081

Raza Khan · Answer 8 · Wed Oct 24 2018 13:14:24 GMT+0800 (China Standard Time)

I get the same error while loading a keras converted model on browser.
following error pops up

As for the model weights:

Note: Some of my converted model tend to work fine on browser having embeddings in it, but sometimes this error shows up. @bileschi @davidsoergel kindly post a proper fix for this issue.

Ashish Sharma · Answer 9 · Wed Oct 24 2018 16:17:47 GMT+0800 (China Standard Time)

Same error here, Someone please help.

tfjs@0.13.0:2 Uncaught (in promise) Error: Provided weight data has no target variable: conv2d/kernel
at new t (tfjs@0.13.0:2)
at loadWeightsFromNamedTensorMap (tfjs@0.13.0:2)
at t.loadWeights (tfjs@0.13.0:2)
at tfjs@0.13.0:2
at tfjs@0.13.0:2
at Object.next (tfjs@0.13.0:2)
at i (tfjs@0.13.0:2)

rajeev-tbrew · Answer 10 · Thu Oct 25 2018 18:42:56 GMT+0800 (China Standard Time)

I hit the same error without batch norm. Appreciate your help.

rajeev-tbrew · Answer 11 · Thu Oct 25 2018 19:56:57 GMT+0800 (China Standard Time)

Folks

I think I found why we are getting this error. The error can happen for any layer. Steps to reproduce the error:

Load the model in tensorFlow using tf.keras.
Load the same model again (basically load the model more than once).
Use tfjs.converters to convert keras model and you get this error.

It seems every layer name changes in model.json file (it will be different than model.summary name). For example, one of the layer in my model was 'conv2d_6' but it got named as 'conv2d_6_2' when I loaded the model twice. But it seems actual weights (assuming in shard file) still expect 'conv2d_6 in my case.

So till we get a fix, pls make sure you load your model only once before doing tfjs conversion. Hope this helps.

Hardik Modi · Answer 12 · Mon Oct 29 2018 15:06:15 GMT+0800 (China Standard Time)

I have tried that too by loading the model exactly once but still, the same error prevails.

stephenrt42 · Answer 13 · Sat Nov 24 2018 12:01:08 GMT+0800 (China Standard Time)

Not sure if this helps, but when I converted my h5 model using the python code
`import tensorflowjs as tfjs
from keras.models import load_model

modelk = load_model('./input/model.h5')
tfjs.converters.save_keras_model(modelk, './output/')`

I would receive the following error:

errors.ts:48 Uncaught (in promise) Error: Provided weight data has no target variable: dense_1_7/kernel at new t (errors.ts:48) at loadWeightsFromNamedTensorMap (container.ts:190) at t.loadWeights (container.ts:759) at models.ts:285 at index.ts:79 at Object.next (index.ts:79) at i (index.ts:79)

But if I convert the h5 model using the tensorflowjs_converter command line tool my tfjs json model file will load without any problems.

Desen Meng · Answer 14 · Mon Dec 17 2018 10:38:42 GMT+0800 (China Standard Time)

model summary

const tf = require('@tensorflow/tfjs');
require('@tensorflow/tfjs-node');
const path = require('path');

async function load(){
  await tf.loadModel(`file://${path.join('xxx', 'model.json')}`);
}

load()

error

(node:72812) UnhandledPromiseRejectionWarning: Error: Provided weight data has no target variable: conv2d_10_1/kernel

Shanqing Cai · Answer 15 · Mon Dec 17 2018 11:51:27 GMT+0800 (China Standard Time)

@demohi Can you try using setting the strict argument to false, i.e.,

  await tf.loadModel(`file://${path.join('xxx', 'model.json')}`, false);

Also, this might be a bug in loadModel. Can you provide the weight and JSON file to us so we may try reproducing this issue on our end? Thanks.

Desen Meng · Answer 16 · Mon Dec 17 2018 12:00:51 GMT+0800 (China Standard Time)

@caisq Thank you for your reply. It works.

you can convert this keras model to fix the bug.

Shanqing Cai · Answer 17 · Sat Dec 22 2018 12:50:15 GMT+0800 (China Standard Time)

@demohi I'm looking into this issue now. It seems the cause to do with the following fact:

The model has a layer with the name conv2d_10, however
One of the weights for that layer is named conv2d_10_1/kernel in the model.h5 file. So there is the extra suffix _1

This is the reason why the weight loading fails and you get the error. Can you tell me a little about how the model is saved from Python side? Is it possible that there are multiple instances of the model existing in Python memory?

I think we need to fix this issue regardless of what happens on the Python side, as Python Keras / TensorFlow can load this sort of model correctly. But I just want to understand the conditions under which this kind of name mismatches happen. Thanks.

Desen Meng · Answer 18 · Sat Dec 22 2018 13:04:42 GMT+0800 (China Standard Time)

@caisq

I use the colab to train this model with keras.

// keras model
model.save('xxx.h5')

Shanqing Cai · Answer 19 · Sat Dec 22 2018 13:11:25 GMT+0800 (China Standard Time)

@demohi At the risk of asking too much, I wonder whether you could try running the same code but from a Python file (or reset the state of the CoLab kernel and run the code from scratch, making sure that each code block is run only once.) I expect the name mismatch to disappear in those cases.

Again, don't feel obliged to try that. But if you do have time to try it and let me know, it would be wonderful.

We'll work on a fix in the meantime.

Lefteris Chatzipetrou · Answer 20 · Mon Dec 24 2018 09:21:17 GMT+0800 (China Standard Time)

I am facing the same issue. It appears to happen when I run the tensorflowjs_converter command (with os.system) while having the model loaded from the same input keras .h5 file. If I run tensorflowjs_converter separately after the python program is over from the shell, it looks to work fine

Shafay · Answer 21 · Tue Feb 26 2019 00:44:19 GMT+0800 (China Standard Time)

Same error with

Error: Provided weight data has no target variable: Conv1_1/kernel

could someone fix it?
I am converting mobilenet_v2 to tensorflow js for browser classification task

Nikhil Thorat · Answer 22 · Tue Feb 26 2019 00:58:23 GMT+0800 (China Standard Time)

+Shanqing Cai <cais@google.com>

…

On Mon, Feb 25, 2019 at 11:44 AM SHAFAY HASEEB ***@***.***> wrote: Same error with Error: Provided weight data has no target variable: Conv1_1/kernel could someone fix it? I am converting mobilenet_v2 to tensorflow js for browser classification task — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#755 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABDLzX1QPJKhfwe3rYYlhN0sayKPxhwFks5vRBLlgaJpZM4XFl_7> .

Shafay · Answer 23 · Tue Feb 26 2019 02:30:55 GMT+0800 (China Standard Time)

@demohi At the risk of asking too much, I wonder whether you could try running the same code but from a Python file (or reset the state of the CoLab kernel and run the code from scratch, making sure that each code block is run only once.) I expect the name mismatch to disappear in those cases.

Again, don't feel obliged to try that. But if you do have time to try it and let me know, it would be wonderful.

We'll work on a fix in the meantime.

This helped me solve the problem. I had to run my google colab again after cleaning the runtime. I made sure i execute each code block once and then i converted the model using
tensorflowjs_converter --input_format keras ./my_model.h5 ./my_model_as_tfjs

It is working perfectly fine in the browser now :)

jamesmf · Answer 24 · Wed Mar 13 2019 21:44:25 GMT+0800 (China Standard Time)

I was experiencing this in the following scenario:

train a model using model.fit(..., callbacks=[ModelCheckpoint])
load the best model (not just the model weights) using model = load_model(ckpt_path)
convert and save using tfjs.converters.save_keras_model

This might be obvious, but the issue was that in the call to load_model I was creating a whole new set of layers without removing the old tf variables. Keras was showing the proper layer.name, but I was still having the mismatch.

The underlying tf.Variable objects had name collisions with the first model, and therefore got a suffix of _1 (like char_embedding_1/embeddings:0 instead of char_embedding/embeddings:0). You can see these names with something like

for layer in model.layers:
    print(l.weights)

To solve my version of the issue (where there was at some point a copy of the same model and I loaded a new one), you can reset the tf session entirely before loading

import keras.backend as K
...
model.fit(data, callbacks=[...])
K.backend.clear_session()  # this resets the session containing the stale, not-best version of the model 
model = load_model(ckpt_path)
tfjs.converters.save_keras_model(model, out_dir)

OGUZ AKKAS · Answer 25 · Fri May 03 2019 05:31:10 GMT+0800 (China Standard Time)

The way I solved the same problem (provided weight data has no target variable conv1_1/kernel) is by cleaning all output and cache of my jupyter notebook, loading model (model = load_model('./tf_files/keras/modelKeras2.h5')) and converting with tfjs (tfjs.converters.save_keras_model(model, './tfjsModelConverted/model6') ).

Hope it helps...

Shashwat Sahay · Answer 26 · Mon May 06 2019 23:36:01 GMT+0800 (China Standard Time)

I was experiencing this in the following scenario:

train a model using model.fit(..., callbacks=[ModelCheckpoint])

load the best model (not just the model weights) using model = load_model(ckpt_path)

convert and save using tfjs.converters.save_keras_model

This might be obvious, but the issue was that in the call to load_model I was creating a whole new set of layers without removing the old tf variables. Keras was showing the proper layer.name, but I was still having the mismatch.

The underlying tf.Variable objects had name collisions with the first model, and therefore got a suffix of _1 (like char_embedding_1/embeddings:0 instead of char_embedding/embeddings:0). You can see these names with something like
for layer in model.layers:
    print(l.weights)
To solve my version of the issue (where there was at some point a copy of the same model and I loaded a new one), you can reset the tf session entirely before loading
import keras.backend as K
...
model.fit(data, callbacks=[...])
K.backend.clear_session()  # this resets the session containing the stale, not-best version of the model 
model = load_model(ckpt_path)
tfjs.converters.save_keras_model(model, out_dir)

This particular way worked for me but instead of using the

keras.backend.clear_session()

functionality I used the tensorflow API to keras as clearing the seesion by directly using keras module throws an error with tensorflow. The following is what I used

tf.keras.backend.clear_session()

anilsathyan · Answer 27 · Mon Sep 30 2019 19:33:01 GMT+0800 (China Standard Time)

Here is how i converted the model using google colab (ipython)...
The python API seems to work for this version atleast. No need to set strict parameter in this case.

Once you have saved the entire model as a h5 file, upload it to colab and run the script to generate the tfjs model.

!pip install tensorflowjs==1.2.6

Restart runtime after installation

import keras

import os
import keras
from keras.models import load_model
import tensorflow as tf

import tfjs

import tensorflowjs as tfjs

load model

tf.compat.v1.disable_eager_execution()
model=load_model('/content/model.h5')# path to model

create directory

!mkdir model

convert model

tfjs.converters.save_keras_model(model, '/content/model')

!zip -r model.zip /content/model

download and verify

RajeshT · Answer 28 · Tue Oct 01 2019 02:32:09 GMT+0800 (China Standard Time)

@anilsathyan7 thank you , closing this issue.