Keras application - Tensor is not an element of this graph on eval after train
damienpontifex opened this issue · comments
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS 10.13.1
- TensorFlow installed from (source or binary): pip
- TensorFlow version (use command below): v1.4.0-rc1-11-g130a514 1.4.0
- Python version: 3.6.3
- CUDA/cuDNN version: N/A CPU only
- Exact command to reproduce:
Describe the problem
Using the estimator API and using tf.keras.applications.VGG16
and it's output for transfer learning, I get an exception raised of TypeError: Cannot interpret feed_dict key as Tensor: Tensor Tensor("vgg_base/Placeholder:0", shape=(3, 3, 3, 64), dtype=float32) is not an element of this graph.
when the model is run a second time.
This is raised when it runs the eval step after train from tf.estimator.train_and_evaluate
. See source code for model and estimator output. This also occurs if I re-run the train_and_evaluate a second time. I am running in a Jupyter notebook and my assumption about memory is that if I do a Kernel ➝ Restart it will run a training run again without the error, but cannot be run in two executions without this.
See https://github.com/damienpontifex/fastai-course/blob/master/deeplearning1/lesson1%2B3/DogsVsCats.ipynb for full notebook, but main parts for estimator model and output are below:
Source code / logs
Estimator Model
def vgg16_model_fn(features, mode, params):
is_training = mode == tf.estimator.ModeKeys.TRAIN
with tf.variable_scope('vgg_base'):
# Use a pre-trained VGG16 model and drop off the top layers as we will retrain
# with our own dense output for our custom classes
vgg16_base = tf.keras.applications.VGG16(
include_top=False,
input_shape=(224, 224, 3),
input_tensor=features['image'],
pooling='avg')
# Disable training for all layers to increase speed for transfer learning
# If new classes significantely different from ImageNet, this may be worth leaving as trainable = True
for layer in vgg16_base.layers:
layer.trainable = False
x = vgg16_base.output
with tf.variable_scope("fc"):
x = tf.layers.flatten(x)
x = tf.layers.dense(x, units=4096, activation=tf.nn.relu, trainable=is_training, name='fc1')
x = tf.layers.dense(x, units=4096, activation=tf.nn.relu, trainable=is_training, name='fc2')
x = tf.layers.dropout(x, rate=0.5, training=is_training)
# Finally add a 2 dense layer for class predictions
with tf.variable_scope("Prediction"):
x = tf.layers.dense(x, units=NUM_CLASSES, trainable=is_training)
return x
Estimator setup
dog_cat_estimator = tf.estimator.Estimator(
model_fn=model_fn,
config=run_config,
params=params
)
train_spec = tf.estimator.TrainSpec(
input_fn=data_input_fn(train_record_filenames, num_epochs=None, batch_size=10, shuffle=True),
max_steps=10)
eval_spec = tf.estimator.EvalSpec(
input_fn=data_input_fn(validation_record_filenames)
)
tf.estimator.train_and_evaluate(dog_cat_estimator, train_spec, eval_spec)
train_and_evaluate output
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after 600 secs (eval_spec.throttle_secs) or training is finished.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from /tmp/DogsVsCats/model.ckpt-1
INFO:tensorflow:Saving checkpoints for 2 into /tmp/DogsVsCats/model.ckpt.
INFO:tensorflow:loss = 0.0, step = 2
INFO:tensorflow:Saving checkpoints for 10 into /tmp/DogsVsCats/model.ckpt.
INFO:tensorflow:Loss for final step: 0.0.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
1063 subfeed_t = self.graph.as_graph_element(subfeed, allow_tensor=True,
-> 1064 allow_operation=False)
1065 except Exception as e:
/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in as_graph_element(self, obj, allow_tensor, allow_operation)
3034 with self._lock:
-> 3035 return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
3036
/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in _as_graph_element_locked(self, obj, allow_tensor, allow_operation)
3113 if obj.graph is not self:
-> 3114 raise ValueError("Tensor %s is not an element of this graph." % obj)
3115 return obj
ValueError: Tensor Tensor("vgg_base/Placeholder:0", shape=(3, 3, 3, 64), dtype=float32) is not an element of this graph.
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-12-67c818ea66c5> in <module>()
----> 1 tf.estimator.train_and_evaluate(dog_cat_estimator, train_spec, eval_spec)
/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/training.py in train_and_evaluate(estimator, train_spec, eval_spec)
428 config.task_type != run_config_lib.TaskType.EVALUATOR):
429 logging.info('Running training and evaluation locally (non-distributed).')
--> 430 executor.run_local()
431 return
432
/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/training.py in run_local(self)
614 # condition is satisfied (both checks use the same global_step value,
615 # i.e., no race condition)
--> 616 metrics = evaluator.evaluate_and_export()
617
618 if not metrics:
/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/training.py in evaluate_and_export(self)
749 name=self._eval_spec.name,
750 checkpoint_path=latest_ckpt_path,
--> 751 hooks=self._eval_spec.hooks)
752
753 if not eval_result:
/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py in evaluate(self, input_fn, steps, hooks, checkpoint_path, name)
353 hooks=hooks,
354 checkpoint_path=checkpoint_path,
--> 355 name=name)
356
357 def _convert_eval_steps_to_hooks(self, steps):
/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py in _evaluate_model(self, input_fn, hooks, checkpoint_path, name)
808 input_fn, model_fn_lib.ModeKeys.EVAL)
809 estimator_spec = self._call_model_fn(
--> 810 features, labels, model_fn_lib.ModeKeys.EVAL, self.config)
811
812 if model_fn_lib.LOSS_METRIC_KEY in estimator_spec.eval_metric_ops:
/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py in _call_model_fn(self, features, labels, mode, config)
692 if 'config' in model_fn_args:
693 kwargs['config'] = config
--> 694 model_fn_results = self._model_fn(features=features, **kwargs)
695
696 if not isinstance(model_fn_results, model_fn_lib.EstimatorSpec):
<ipython-input-8-e251e8b8fccf> in model_fn(features, labels, mode, params)
3 tf.summary.image('images', features['image'], max_outputs=6)
4
----> 5 logits = vgg16_model_fn(features, mode, params)
6
7 # Dictionary with label as outcome with greatest probability
<ipython-input-7-93330b8a5aa6> in vgg16_model_fn(features, mode, params)
10 input_shape=(224, 224, 3),
11 input_tensor=features['image'],
---> 12 pooling='avg')
13
14 # Disable training for all layers to increase speed for transfer learning
/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/applications/vgg16.py in VGG16(include_top, weights, input_tensor, input_shape, pooling, classes)
199 WEIGHTS_PATH_NO_TOP,
200 cache_subdir='models')
--> 201 model.load_weights(weights_path)
202 if K.backend() == 'theano':
203 layer_utils.convert_all_kernels_in_model(model)
/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/engine/topology.py in load_weights(self, filepath, by_name)
1097 load_weights_from_hdf5_group_by_name(f, self.layers)
1098 else:
-> 1099 load_weights_from_hdf5_group(f, self.layers)
1100
1101 if hasattr(f, 'close'):
/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/engine/topology.py in load_weights_from_hdf5_group(f, layers)
1484 str(len(weight_values)) + ' elements.')
1485 weight_value_tuples += zip(symbolic_weights, weight_values)
-> 1486 K.batch_set_value(weight_value_tuples)
1487
1488
/usr/local/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/backend.py in batch_set_value(tuples)
2404 assign_ops.append(assign_op)
2405 feed_dict[assign_placeholder] = value
-> 2406 get_session().run(assign_ops, feed_dict=feed_dict)
2407
2408
/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
887 try:
888 result = self._run(None, fetches, feed_dict, options_ptr,
--> 889 run_metadata_ptr)
890 if run_metadata:
891 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
1065 except Exception as e:
1066 raise TypeError('Cannot interpret feed_dict key as Tensor: '
-> 1067 + e.args[0])
1068
1069 if isinstance(subfeed_val, ops.Tensor):
TypeError: Cannot interpret feed_dict key as Tensor: Tensor Tensor("vgg_base/Placeholder:0", shape=(3, 3, 3, 64), dtype=float32) is not an element of this graph.
This question is better asked on StackOverflow since it is not a bug or feature request. There is also a larger community that reads questions there. Thanks!
This question is better asked on StackOverflow since it is not a bug or feature request. There is also a larger community that reads questions there. Thanks!
@bignamehyp I had assumed this was a bug as it seems to be occurring with variables setup inside ‘tf.keras.applications.VGG16’ rather than any I had setup. Thoughts?
@bignamehyp Someone already asked a similar question on stack overflow.
The solution is to call tf.keras.backend.clear_session()
after the call to train()
. However, this won't work if the user wants to use train_and_evaluate()
since there is no place to call clear_session()
.
@bignamehyp does this information from @hsm207 provide any further insights? If I have to call clear_session()
between runs, this would seem to be unexpected behaviour and be a bug?
Just still not sure why it's happening to provide insights on a potential solution.
tf.Tensor 'shuffle_batch:0' shape=(64, 256, 256, 1) dtype=float32> cannot be interpreted as a Tensor
If you find this problem, try to write K.clear_session() when you secondly use your function for establishing your graph. Besides, you should reload the model and predict it randomly with a simple input. I fixed my code just like this:
uncerts_normal = get_mc_predictions(model, X_test, Y_label,
batch_size=args.batch_size)
.var(axis=0)#.mean(axis=1)
print(uncerts_normal.shape)
uncerts_normal1 = l2_normalize(a, axis=-1)
K.clear_session()
model = load_model('../data/model_%s.h5' % args.dataset)
print('testing model1:', model.predict(np.zeros((1, 28, 28, 1))))
uncerts_noisy = get_mc_predictions(model, X_test_noisy,Y_label,
batch_size=args.batch_size).var(axis=0)
K.clear_session() did not work for me
however, what worked was :
def load_model():
global model
model = ResNet50(weights="imagenet")
# this is key : save the graph after loading the model
global graph
graph = tf.get_default_graph()
While predicting, use the same graph
with graph.as_default():
preds = model.predict(image)
#... etc
This worked for me
from keras import backend as K
and after predicting my data i inserted this part of code
K.clear_session()
The solution given by @anujgupta82 worked for me. Thanks a lot !
Same problem here when trying to make an inference using a keras pre-trained model from a flask application. Thanks @anujgupta82 !
The solution from @anujgupta82 worked for me too. But, can someone help me to understand what is going on?
The solution given by @Qmoliang and @MohammedYunus worked for me. Thanks :)
The solution by @anujgupta82 also worked for me. Saved me a lot of stress!
Wow, thanks @anujgupta82 a lot ! Really a nice answer :-)
clear_session()
In my case, load_model() works for the first time but not afterward. If you are experiencing the same issue, you need to clear_session() after each time you load the model!
Thanks @anujgupta82 ,works for me too!
Thanks a lot, worked for me!
If you find this problem, try to write K.clear_session() when you secondly use your function for establishing your graph. Besides, you should reload the model and predict it randomly with a simple input. I fixed my code just like this:
uncerts_normal = get_mc_predictions(model, X_test, Y_label, batch_size=args.batch_size) .var(axis=0)#.mean(axis=1) print(uncerts_normal.shape) uncerts_normal1 = l2_normalize(a, axis=-1) K.clear_session() model = load_model('../data/model_%s.h5' % args.dataset) print('testing model1:', model.predict(np.zeros((1, 28, 28, 1)))) uncerts_noisy = get_mc_predictions(model, X_test_noisy,Y_label, batch_size=args.batch_size).var(axis=0)
What if the model we have trained has already been saved and we are in the loading, then predicting phase when this error occurs? Any other thoughts?
K.clear_session() did not work for me
however, what worked was :
def load_model(): global model model = ResNet50(weights="imagenet") # this is key : save the graph after loading the model global graph graph = tf.get_default_graph()
While predicting, use the same graph
with graph.as_default(): preds = model.predict(image) #... etc
god among men. Worked.
The reason why the code from @anujgupta82 works is given in this StackOverFlow answer.
Flask uses multiple threads. The problem you are running into is because the tensorflow model is not loaded and used in the same thread. One workaround is to force tensorflow to use the gloabl default graph .
K.clear_session() did not work for me
however, what worked was :
def load_model(): global model model = ResNet50(weights="imagenet") # this is key : save the graph after loading the model global graph graph = tf.get_default_graph()
While predicting, use the same graph
with graph.as_default(): preds = model.predict(image) #... etc
Thanks. I struggled the same problem for half a day and solved it as your suggestion.
The solution by @anujgupta82 worked for me. thanks
The approach provided mohamedadaly is described here with an example. check this link:
https://interviewbubble.com/typeerror-cannot-interpret-feed_dict-key-as-tensor-tensor-tensor-is-not-an-element-of-this-graph/
hi, dear.
i use your function, but no work for me, no any error message, and no any reponse.
# load_keras_model.py
class LoadKerasModel:
model = None
graph = None
def __init__(self):
self.keras_resource()
self.init_model()
def init_model(self):
self.graph = tf.get_default_graph()
self.model = load_model(file_path)
self.model.predict(np.ones((1, 1, 1, 1)))
def keras_resource(self):
num_cores = 4
if os.getenv('TENSORFLOW_VERSION') == 'GPU':
num_gpu = 1
num_cpu = 1
elif os.getenv('TENSORFLOW_VERSION') == 'CPU':
num_gpu = 0
num_cpu = 1
else:
raise NonResourceException()
config = tf.ConfigProto(intra_op_parallelism_threads=num_cores,
inter_op_parallelism_threads=num_cores, allow_soft_placement=True,
device_count={'CPU': num_cpu, 'GPU': num_gpu})
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
K.set_session(session)
def predict_target(selfl, img_generator):
with self.graph.as_default():
predict = self.model.predict_generator(
img_generator,
steps=len(img_generator),
verbose=1
)
return predict
load_keras_model = LoadKerasModel()
my environment
python 3.5
keras 2.24
tensorflow: 1.12
my activate uwsgi command
uwsgi --http-socket 0.0.0.0:5001 --wsgi-file wsgi.py --callable app --http-enable-proxy-protocol --processes 4 --threads 2 --stats 0.0.0.0:5002
while i use flask run to activate my application, it works very well, but not work while use uwsgi.
flask is factory method to activate, while init flask app, i import load_keras_model.
i don't sure where i wrong, because no any error message, hope somebody can help me, thanks.
this works for me,
@shaoeChen how is this working for you? It turns out this way does not need a clear_session call and is at the same time configuration friendly
from keras.backend.tensorflow_backend import set_session
# load_keras_model.py
class LoadKerasModel:
model = None
graph = None
def __init__(self):
config = self.keras_resource()
self.init_model(config)
def init_model(self, _config, *args):
session = tf.Session(config=_config)
self.graph = session.graph
set_session(session)
self.model = load_model(file_path)
def keras_resource(self):
num_cores = 4
if os.getenv('TENSORFLOW_VERSION') == 'GPU':
num_gpu = 1
num_cpu = 1
elif os.getenv('TENSORFLOW_VERSION') == 'CPU':
num_gpu = 0
num_cpu = 1
else:
raise NonResourceException()
config = tf.ConfigProto(intra_op_parallelism_threads=num_cores,
inter_op_parallelism_threads=num_cores, allow_soft_placement=True,
device_count={'CPU': num_cpu, 'GPU': num_gpu})
config.gpu_options.allow_growth = True
return config
def predict_target(self, img_generator):
with self.graph.as_default():
predict = self.model.predict_generator(
img_generator,
steps=len(img_generator),
verbose=1
)
return predict
load_keras_model = LoadKerasModel()
load_keras_model.predict_target(np.ones((1, 1, 1, 1))) #img_generator
@ArashHosseini
hi dear.
i try it and get same reply, it's no response and no error. the browser is reading, reading, reading.
even i set uwsgi one process one thread as below:
uwsgi --http-socket 0.0.0.0:5001 --wsgi-file wsgi.py --callable app --http-enable-proxy-protocol --processes 1 --threads 1 --stats 0.0.0.0:5002
Now, i try use gunicron as below, and five seconds can get predict_generator response:
gunicorn --thread=2 --workers=1 wsgi:app -b 0.0.0.0:5001
It work well for me, i think i need study how to use uwsgi correctly.
thanks for your guidnace.
@shaoeChen, thx for reply, i edited the code, set_session
in __init__
was missing, now the GPU consumption should be significantly lower, let me know if that(gpu_config) worked in your case, thanks
@ArashHosseini , sorry to late to reply,
now i notice that GPU resource is not under my control.
Original it use 1355MB, but now it use all 1888MB as below say:
hi @ArashHosseini . i am sorry.
i think i miss some setting, now i sure the gpu memory usage is the same, as below say:
thanks your advice.
This worked for me
from keras import backend as K
and after predicting my data i inserted this part of code
K.clear_session()
Thank you!
I have encountered this error in a code I was working with, and none of the above answers worked for me.
What I found as the problem was that the code had mixed uses of keras
and tensorflow.keras
, and using keras.backend.clear_session()
instead of tensorflow.keras.backend.clear_session()
broke everything after the network was trained for the first time.
@anujgupta82 you save my day
I have encountered this error in a code I was working with, and none of the above answers worked for me.
What I found as the problem was that the code had mixed uses of
keras
andtensorflow.keras
, and usingkeras.backend.clear_session()
instead oftensorflow.keras.backend.clear_session()
broke everything after the network was trained for the first time.
thanks, I got the same problem with you, and follow your answer, I fixed this problem.
K.clear_session() did not work for me
however, what worked was :
def load_model(): global model model = ResNet50(weights="imagenet") # this is key : save the graph after loading the model global graph graph = tf.get_default_graph()
While predicting, use the same graph
with graph.as_default(): preds = model.predict(image) #... etc
can you please help in this code that you have written
K.clear_session() did not work for me
however, what worked was :
def load_model(): global model model = ResNet50(weights="imagenet") # this is key : save the graph after loading the model global graph graph = tf.get_default_graph()
While predicting, use the same graph
with graph.as_default(): preds = model.predict(image) #... etc
Had the same issue and the solution helped me, but with small improvement:
import ktrain
import tensorflow as tf
import flask
app = flask.Flask(__name__)
predictor = None
graph = None
def load_predictor():
global predictor
predictor = ktrain.load_predictor('saved_model')
if hasattr(predictor.model, '_make_predict_function'):
predictor.model._make_predict_function()
global graph
graph = tf.get_default_graph()
@app.route("/analyze/<text>")
def predict(text):
with graph.as_default():
prediction = predictor.predict(text)
return prediction, 200
if __name__ == "__main__":
load_predictor()
app.run()
tensorflow==1.15.0rc2
ktrain==0.5.2
flask==0.12.2
Use:
import keras
keras.backend.clear_session()
Before Initializing the Model
I reverted my TF to 1.13.1 and Keras to 2.2.4 and this error disappeared.
have tried all the above, but no use
tensorflow/models#8448
I use train_and_evaluate() and meet the same error. @damienpontifex since this issue is continuously referenced by similar errors, could you kindly upload the fixed code please?