tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone

Home Page:https://tensorflow.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Keras application - Tensor is not an element of this graph on eval after train

damienpontifex opened this issue · comments

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS 10.13.1
  • TensorFlow installed from (source or binary): pip
  • TensorFlow version (use command below): v1.4.0-rc1-11-g130a514 1.4.0
  • Python version: 3.6.3
  • CUDA/cuDNN version: N/A CPU only
  • Exact command to reproduce:

Describe the problem

Using the estimator API and using tf.keras.applications.VGG16 and it's output for transfer learning, I get an exception raised of TypeError: Cannot interpret feed_dict key as Tensor: Tensor Tensor("vgg_base/Placeholder:0", shape=(3, 3, 3, 64), dtype=float32) is not an element of this graph. when the model is run a second time.

This is raised when it runs the eval step after train from tf.estimator.train_and_evaluate. See source code for model and estimator output. This also occurs if I re-run the train_and_evaluate a second time. I am running in a Jupyter notebook and my assumption about memory is that if I do a Kernel ➝ Restart it will run a training run again without the error, but cannot be run in two executions without this.

See https://github.com/damienpontifex/fastai-course/blob/master/deeplearning1/lesson1%2B3/DogsVsCats.ipynb for full notebook, but main parts for estimator model and output are below:

Source code / logs

Estimator Model

def vgg16_model_fn(features, mode, params):
    
    is_training = mode == tf.estimator.ModeKeys.TRAIN
    
    with tf.variable_scope('vgg_base'):
        # Use a pre-trained VGG16 model and drop off the top layers as we will retrain 
        # with our own dense output for our custom classes
        vgg16_base = tf.keras.applications.VGG16(
            include_top=False,
            input_shape=(224, 224, 3),
            input_tensor=features['image'],
            pooling='avg')

        # Disable training for all layers to increase speed for transfer learning
        # If new classes significantely different from ImageNet, this may be worth leaving as trainable = True
        for layer in vgg16_base.layers:
            layer.trainable = False

        x = vgg16_base.output
    
    with tf.variable_scope("fc"):
        x = tf.layers.flatten(x)
        x = tf.layers.dense(x, units=4096, activation=tf.nn.relu, trainable=is_training, name='fc1')
        x = tf.layers.dense(x, units=4096, activation=tf.nn.relu, trainable=is_training, name='fc2')
        x = tf.layers.dropout(x, rate=0.5, training=is_training)
        
    # Finally add a 2 dense layer for class predictions
    with tf.variable_scope("Prediction"):
        x = tf.layers.dense(x, units=NUM_CLASSES, trainable=is_training)
        return x

Estimator setup

dog_cat_estimator = tf.estimator.Estimator(
    model_fn=model_fn,
    config=run_config,
    params=params
)
train_spec = tf.estimator.TrainSpec(
    input_fn=data_input_fn(train_record_filenames, num_epochs=None, batch_size=10, shuffle=True), 
    max_steps=10)
eval_spec = tf.estimator.EvalSpec(
    input_fn=data_input_fn(validation_record_filenames)
)
tf.estimator.train_and_evaluate(dog_cat_estimator, train_spec, eval_spec)

train_and_evaluate output

INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after 600 secs (eval_spec.throttle_secs) or training is finished.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from /tmp/DogsVsCats/model.ckpt-1
INFO:tensorflow:Saving checkpoints for 2 into /tmp/DogsVsCats/model.ckpt.
INFO:tensorflow:loss = 0.0, step = 2
INFO:tensorflow:Saving checkpoints for 10 into /tmp/DogsVsCats/model.ckpt.
INFO:tensorflow:Loss for final step: 0.0.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1063             subfeed_t = self.graph.as_graph_element(subfeed, allow_tensor=True,
-> 1064                                                     allow_operation=False)
   1065           except Exception as e:

/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in as_graph_element(self, obj, allow_tensor, allow_operation)
   3034     with self._lock:
-> 3035       return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
   3036 

/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in _as_graph_element_locked(self, obj, allow_tensor, allow_operation)
   3113       if obj.graph is not self:
-> 3114         raise ValueError("Tensor %s is not an element of this graph." % obj)
   3115       return obj

ValueError: Tensor Tensor("vgg_base/Placeholder:0", shape=(3, 3, 3, 64), dtype=float32) is not an element of this graph.

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-12-67c818ea66c5> in <module>()
----> 1 tf.estimator.train_and_evaluate(dog_cat_estimator, train_spec, eval_spec)

/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/training.py in train_and_evaluate(estimator, train_spec, eval_spec)
    428       config.task_type != run_config_lib.TaskType.EVALUATOR):
    429     logging.info('Running training and evaluation locally (non-distributed).')
--> 430     executor.run_local()
    431     return
    432 

/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/training.py in run_local(self)
    614       # condition is satisfied (both checks use the same global_step value,
    615       # i.e., no race condition)
--> 616       metrics = evaluator.evaluate_and_export()
    617 
    618       if not metrics:

/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/training.py in evaluate_and_export(self)
    749           name=self._eval_spec.name,
    750           checkpoint_path=latest_ckpt_path,
--> 751           hooks=self._eval_spec.hooks)
    752 
    753       if not eval_result:

/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py in evaluate(self, input_fn, steps, hooks, checkpoint_path, name)
    353         hooks=hooks,
    354         checkpoint_path=checkpoint_path,
--> 355         name=name)
    356 
    357   def _convert_eval_steps_to_hooks(self, steps):

/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py in _evaluate_model(self, input_fn, hooks, checkpoint_path, name)
    808           input_fn, model_fn_lib.ModeKeys.EVAL)
    809       estimator_spec = self._call_model_fn(
--> 810           features, labels, model_fn_lib.ModeKeys.EVAL, self.config)
    811 
    812       if model_fn_lib.LOSS_METRIC_KEY in estimator_spec.eval_metric_ops:

/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py in _call_model_fn(self, features, labels, mode, config)
    692     if 'config' in model_fn_args:
    693       kwargs['config'] = config
--> 694     model_fn_results = self._model_fn(features=features, **kwargs)
    695 
    696     if not isinstance(model_fn_results, model_fn_lib.EstimatorSpec):

<ipython-input-8-e251e8b8fccf> in model_fn(features, labels, mode, params)
      3     tf.summary.image('images', features['image'], max_outputs=6)
      4 
----> 5     logits = vgg16_model_fn(features, mode, params)
      6 
      7     # Dictionary with label as outcome with greatest probability

<ipython-input-7-93330b8a5aa6> in vgg16_model_fn(features, mode, params)
     10             input_shape=(224, 224, 3),
     11             input_tensor=features['image'],
---> 12             pooling='avg')
     13 
     14         # Disable training for all layers to increase speed for transfer learning

/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/applications/vgg16.py in VGG16(include_top, weights, input_tensor, input_shape, pooling, classes)
    199           WEIGHTS_PATH_NO_TOP,
    200           cache_subdir='models')
--> 201     model.load_weights(weights_path)
    202     if K.backend() == 'theano':
    203       layer_utils.convert_all_kernels_in_model(model)

/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/engine/topology.py in load_weights(self, filepath, by_name)
   1097       load_weights_from_hdf5_group_by_name(f, self.layers)
   1098     else:
-> 1099       load_weights_from_hdf5_group(f, self.layers)
   1100 
   1101     if hasattr(f, 'close'):

/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/engine/topology.py in load_weights_from_hdf5_group(f, layers)
   1484                        str(len(weight_values)) + ' elements.')
   1485     weight_value_tuples += zip(symbolic_weights, weight_values)
-> 1486   K.batch_set_value(weight_value_tuples)
   1487 
   1488 

/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/backend.py in batch_set_value(tuples)
   2404       assign_ops.append(assign_op)
   2405       feed_dict[assign_placeholder] = value
-> 2406     get_session().run(assign_ops, feed_dict=feed_dict)
   2407 
   2408 

/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    887     try:
    888       result = self._run(None, fetches, feed_dict, options_ptr,
--> 889                          run_metadata_ptr)
    890       if run_metadata:
    891         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1065           except Exception as e:
   1066             raise TypeError('Cannot interpret feed_dict key as Tensor: '
-> 1067                             + e.args[0])
   1068 
   1069           if isinstance(subfeed_val, ops.Tensor):

TypeError: Cannot interpret feed_dict key as Tensor: Tensor Tensor("vgg_base/Placeholder:0", shape=(3, 3, 3, 64), dtype=float32) is not an element of this graph.

This question is better asked on StackOverflow since it is not a bug or feature request. There is also a larger community that reads questions there. Thanks!

This question is better asked on StackOverflow since it is not a bug or feature request. There is also a larger community that reads questions there. Thanks!

@bignamehyp I had assumed this was a bug as it seems to be occurring with variables setup inside ‘tf.keras.applications.VGG16’ rather than any I had setup. Thoughts?

@bignamehyp Someone already asked a similar question on stack overflow.

The solution is to call tf.keras.backend.clear_session() after the call to train(). However, this won't work if the user wants to use train_and_evaluate() since there is no place to call clear_session().

@bignamehyp does this information from @hsm207 provide any further insights? If I have to call clear_session() between runs, this would seem to be unexpected behaviour and be a bug?

Just still not sure why it's happening to provide insights on a potential solution.

tf.Tensor 'shuffle_batch:0' shape=(64, 256, 256, 1) dtype=float32> cannot be interpreted as a Tensor

If you find this problem, try to write K.clear_session() when you secondly use your function for establishing your graph. Besides, you should reload the model and predict it randomly with a simple input. I fixed my code just like this:

uncerts_normal = get_mc_predictions(model, X_test, Y_label,
batch_size=args.batch_size) 
.var(axis=0)#.mean(axis=1)
print(uncerts_normal.shape)
uncerts_normal1 = l2_normalize(a, axis=-1)
K.clear_session() 
model = load_model('../data/model_%s.h5' % args.dataset)
print('testing model1:', model.predict(np.zeros((1, 28, 28, 1))))
uncerts_noisy = get_mc_predictions(model, X_test_noisy,Y_label,
                                   batch_size=args.batch_size).var(axis=0)

K.clear_session() did not work for me

however, what worked was :

def load_model():
	global model
	model = ResNet50(weights="imagenet")
            # this is key : save the graph after loading the model
	global graph
	graph = tf.get_default_graph()

While predicting, use the same graph

    with graph.as_default():
	preds = model.predict(image)
	#... etc

This worked for me
from keras import backend as K
and after predicting my data i inserted this part of code
K.clear_session()

The solution given by @anujgupta82 worked for me. Thanks a lot !

Same problem here when trying to make an inference using a keras pre-trained model from a flask application. Thanks @anujgupta82 !

The solution from @anujgupta82 worked for me too. But, can someone help me to understand what is going on?

commented

The solution given by @Qmoliang and @MohammedYunus worked for me. Thanks :)

The solution by @anujgupta82 also worked for me. Saved me a lot of stress!

Wow, thanks @anujgupta82 a lot ! Really a nice answer :-)

clear_session()

In my case, load_model() works for the first time but not afterward. If you are experiencing the same issue, you need to clear_session() after each time you load the model!

Thanks @anujgupta82 ,works for me too!

Thanks a lot, worked for me!

commented

If you find this problem, try to write K.clear_session() when you secondly use your function for establishing your graph. Besides, you should reload the model and predict it randomly with a simple input. I fixed my code just like this:

uncerts_normal = get_mc_predictions(model, X_test, Y_label,
batch_size=args.batch_size) 
.var(axis=0)#.mean(axis=1)
print(uncerts_normal.shape)
uncerts_normal1 = l2_normalize(a, axis=-1)
K.clear_session() 
model = load_model('../data/model_%s.h5' % args.dataset)
print('testing model1:', model.predict(np.zeros((1, 28, 28, 1))))
uncerts_noisy = get_mc_predictions(model, X_test_noisy,Y_label,
                                   batch_size=args.batch_size).var(axis=0)

What if the model we have trained has already been saved and we are in the loading, then predicting phase when this error occurs? Any other thoughts?

commented

K.clear_session() did not work for me

however, what worked was :

def load_model():
	global model
	model = ResNet50(weights="imagenet")
            # this is key : save the graph after loading the model
	global graph
	graph = tf.get_default_graph()

While predicting, use the same graph

    with graph.as_default():
	preds = model.predict(image)
	#... etc

god among men. Worked.

The reason why the code from @anujgupta82 works is given in this StackOverFlow answer.

Flask uses multiple threads. The problem you are running into is because the tensorflow model is not loaded and used in the same thread. One workaround is to force tensorflow to use the gloabl default graph .

K.clear_session() did not work for me

however, what worked was :

def load_model():
	global model
	model = ResNet50(weights="imagenet")
            # this is key : save the graph after loading the model
	global graph
	graph = tf.get_default_graph()

While predicting, use the same graph

    with graph.as_default():
	preds = model.predict(image)
	#... etc

Thanks. I struggled the same problem for half a day and solved it as your suggestion.

The solution by @anujgupta82 worked for me. thanks

The approach provided mohamedadaly is described here with an example. check this link:
https://interviewbubble.com/typeerror-cannot-interpret-feed_dict-key-as-tensor-tensor-tensor-is-not-an-element-of-this-graph/

hi, dear.
i use your function, but no work for me, no any error message, and no any reponse.

# load_keras_model.py
class LoadKerasModel:
    model = None
    graph = None

    def __init__(self):
        self.keras_resource()
        self.init_model()

    def init_model(self):
        self.graph = tf.get_default_graph()
        self.model = load_model(file_path)
        self.model.predict(np.ones((1, 1, 1, 1)))

    def keras_resource(self):
        num_cores = 4

        if os.getenv('TENSORFLOW_VERSION') == 'GPU':
            num_gpu = 1
            num_cpu = 1
        elif os.getenv('TENSORFLOW_VERSION') == 'CPU':
            num_gpu = 0
            num_cpu = 1
        else:
            raise NonResourceException()

        config = tf.ConfigProto(intra_op_parallelism_threads=num_cores,
                                inter_op_parallelism_threads=num_cores, allow_soft_placement=True,
                                device_count={'CPU': num_cpu, 'GPU': num_gpu})
        config.gpu_options.allow_growth = True

        session = tf.Session(config=config)
        K.set_session(session)

    def predict_target(selfl, img_generator):
        with self.graph.as_default():
            predict = self.model.predict_generator(
                img_generator,
                steps=len(img_generator),
                verbose=1
            )
        return predict

load_keras_model = LoadKerasModel()

my environment

python 3.5
keras 2.24
tensorflow: 1.12

my activate uwsgi command

uwsgi --http-socket 0.0.0.0:5001 --wsgi-file wsgi.py --callable app --http-enable-proxy-protocol --processes 4 --threads 2 --stats 0.0.0.0:5002

while i use flask run to activate my application, it works very well, but not work while use uwsgi.
flask is factory method to activate, while init flask app, i import load_keras_model.
i don't sure where i wrong, because no any error message, hope somebody can help me, thanks.

this works for me,
@shaoeChen how is this working for you? It turns out this way does not need a clear_session call and is at the same time configuration friendly

from keras.backend.tensorflow_backend import set_session
# load_keras_model.py
class LoadKerasModel:
    model = None
    graph = None

    def __init__(self):
        config = self.keras_resource()
        self.init_model(config)

    def init_model(self, _config, *args):
        session = tf.Session(config=_config)
        self.graph = session.graph
        set_session(session)
        self.model = load_model(file_path)

    def keras_resource(self):
        num_cores = 4

        if os.getenv('TENSORFLOW_VERSION') == 'GPU':
            num_gpu = 1
            num_cpu = 1
        elif os.getenv('TENSORFLOW_VERSION') == 'CPU':
            num_gpu = 0
            num_cpu = 1
        else:
            raise NonResourceException()

        config = tf.ConfigProto(intra_op_parallelism_threads=num_cores,
                                inter_op_parallelism_threads=num_cores, allow_soft_placement=True,
                                device_count={'CPU': num_cpu, 'GPU': num_gpu})
        config.gpu_options.allow_growth = True
        
        return config

    def predict_target(self, img_generator):
        with self.graph.as_default():
            predict = self.model.predict_generator(
                img_generator,
                steps=len(img_generator),
                verbose=1
            )
        return predict

load_keras_model = LoadKerasModel()
load_keras_model.predict_target(np.ones((1, 1, 1, 1))) #img_generator

@ArashHosseini
hi dear.
i try it and get same reply, it's no response and no error. the browser is reading, reading, reading.
even i set uwsgi one process one thread as below:

uwsgi --http-socket 0.0.0.0:5001 --wsgi-file wsgi.py --callable app --http-enable-proxy-protocol --processes 1 --threads 1 --stats 0.0.0.0:5002

Now, i try use gunicron as below, and five seconds can get predict_generator response:

gunicorn --thread=2 --workers=1 wsgi:app -b 0.0.0.0:5001

It work well for me, i think i need study how to use uwsgi correctly.
thanks for your guidnace.

@shaoeChen, thx for reply, i edited the code, set_session in __init__ was missing, now the GPU consumption should be significantly lower, let me know if that(gpu_config) worked in your case, thanks

@ArashHosseini , sorry to late to reply,
now i notice that GPU resource is not under my control.
Original it use 1355MB, but now it use all 1888MB as below say:
image

original gpu memory:
image

hi @ArashHosseini . i am sorry.
i think i miss some setting, now i sure the gpu memory usage is the same, as below say:
image

thanks your advice.

This worked for me
from keras import backend as K
and after predicting my data i inserted this part of code
K.clear_session()

Thank you!

I have encountered this error in a code I was working with, and none of the above answers worked for me.

What I found as the problem was that the code had mixed uses of keras and tensorflow.keras, and using keras.backend.clear_session() instead of tensorflow.keras.backend.clear_session() broke everything after the network was trained for the first time.

@anujgupta82 you save my day

commented

I have encountered this error in a code I was working with, and none of the above answers worked for me.

What I found as the problem was that the code had mixed uses of keras and tensorflow.keras, and using keras.backend.clear_session() instead of tensorflow.keras.backend.clear_session() broke everything after the network was trained for the first time.

thanks, I got the same problem with you, and follow your answer, I fixed this problem.

K.clear_session() did not work for me

however, what worked was :

def load_model():
	global model
	model = ResNet50(weights="imagenet")
            # this is key : save the graph after loading the model
	global graph
	graph = tf.get_default_graph()

While predicting, use the same graph

    with graph.as_default():
	preds = model.predict(image)
	#... etc

can you please help in this code that you have written

K.clear_session() did not work for me

however, what worked was :

def load_model():
	global model
	model = ResNet50(weights="imagenet")
            # this is key : save the graph after loading the model
	global graph
	graph = tf.get_default_graph()

While predicting, use the same graph

    with graph.as_default():
	preds = model.predict(image)
	#... etc

Had the same issue and the solution helped me, but with small improvement:

import ktrain
import tensorflow as tf
import flask

app = flask.Flask(__name__)
predictor = None
graph = None

def load_predictor():
    global predictor

    predictor = ktrain.load_predictor('saved_model')

    if hasattr(predictor.model, '_make_predict_function'):
        predictor.model._make_predict_function()

    global graph
    graph = tf.get_default_graph()

@app.route("/analyze/<text>")
def predict(text):
    with graph.as_default():
        prediction = predictor.predict(text)
    return prediction, 200

if __name__ == "__main__":
    load_predictor()
    app.run()

tensorflow==1.15.0rc2
ktrain==0.5.2
flask==0.12.2

Use:
import keras
keras.backend.clear_session()

Before Initializing the Model

I reverted my TF to 1.13.1 and Keras to 2.2.4 and this error disappeared.

have tried all the above, but no use
tensorflow/models#8448

I use train_and_evaluate() and meet the same error. @damienpontifex since this issue is continuously referenced by similar errors, could you kindly upload the fixed code please?