bourdakos1 / Custom-Object-Detection

Custom Object Detection with TensorFlow

Home Page:https://medium.freecodecamp.org/tracking-the-millenium-falcon-with-tensorflow-c8c86419225e

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

training error

Tejeshwarabm opened this issue · comments

2018-03-04 13:19:20.290777: W tensorflow/core/framework/op_kernel.cc:1192] Resource exhausted: OOM when allocating tensor with shape[1,1024,51,38]
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.ResourceExhaustedError'>, OOM when allocating tensor with shape[1,1024,51,38]
[[Node: FirstStageFeatureExtractor/resnet_v1_101/resnet_v1_101/block3/unit_12/bottleneck_v1/conv3/BatchNorm/FusedBatchNorm = FusedBatchNorm[T=DT_FLOAT, data_format="NHWC", epsilon=1.001e-05, is_training=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](FirstStageFeatureExtractor/resnet_v1_101/resnet_v1_101/block3/unit_12/bottleneck_v1/conv3/Conv2D, FirstStageFeatureExtractor/resnet_v1_101/block3/unit_12/bottleneck_v1/conv3/BatchNorm/gamma/read/_683, FirstStageFeatureExtractor/resnet_v1_101/block3/unit_12/bottleneck_v1/conv3/BatchNorm/beta/read/_685, FirstStageFeatureExtractor/resnet_v1_101/block3/unit_12/bottleneck_v1/conv3/BatchNorm/moving_mean/read/_687, FirstStageFeatureExtractor/resnet_v1_101/block3/unit_12/bottleneck_v1/conv3/BatchNorm/moving_variance/read/_689)]]
[[Node: Reshape_24/_1255 = _HostRecvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4073_Reshape_24", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Caused by op u'FirstStageFeatureExtractor/resnet_v1_101/resnet_v1_101/block3/unit_12/bottleneck_v1/conv3/BatchNorm/FusedBatchNorm', defined at:
File "object_detection/train.py", line 198, in
tf.app.run()
File "/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "object_detection/train.py", line 194, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "
/Documents/Custom-Object-Detection-master/object_detection/trainer.py", line 192, in train
clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue])
File "/Documents/Custom-Object-Detection-master/slim/deployment/model_deploy.py", line 193, in create_clones
outputs = model_fn(*args, **kwargs)
File "
/Documents/Custom-Object-Detection-master/object_detection/trainer.py", line 131, in _create_losses
prediction_dict = detection_model.predict(images)
File "/Documents/Custom-Object-Detection-master/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 513, in predict
image_shape) = self._extract_rpn_feature_maps(preprocessed_inputs)
File "
/Documents/Custom-Object-Detection-master/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 652, in _extract_rpn_feature_maps
preprocessed_inputs, scope=self.first_stage_feature_extractor_scope)
File "/Documents/Custom-Object-Detection-master/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 131, in extract_proposal_features
return self._extract_proposal_features(preprocessed_inputs, scope)
File "
/Documents/Custom-Object-Detection-master/object_detection/models/faster_rcnn_resnet_v1_feature_extractor.py", line 126, in _extract_proposal_features
scope=var_scope)
File "/Documents/Custom-Object-Detection-master/slim/nets/resnet_v1.py", line 298, in resnet_v1_101
reuse=reuse, scope=scope)
File "
/Documents/Custom-Object-Detection-master/slim/nets/resnet_v1.py", line 216, in resnet_v1
net = resnet_utils.stack_blocks_dense(net, blocks, output_stride)
File "/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "
/Documents/Custom-Object-Detection-master/slim/nets/resnet_utils.py", line 185, in stack_blocks_dense
net = block.unit_fn(net, rate=rate, **dict(unit, stride=1))
File "/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "
/Documents/Custom-Object-Detection-master/slim/nets/resnet_v1.py", line 118, in bottleneck
activation_fn=None, scope='conv3')
File "/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "
/lib/python2.7/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1042, in convolution
outputs = normalizer_fn(outputs, **normalizer_params)
File "/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "
/lib/python2.7/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 643, in batch_norm
outputs = layer.apply(inputs, training=is_training)
File "/lib/python2.7/site-packages/tensorflow/python/layers/base.py", line 671, in apply
return self.call(inputs, *args, **kwargs)
File "
/lib/python2.7/site-packages/tensorflow/python/layers/base.py", line 575, in call
outputs = self.call(inputs, *args, **kwargs)
File "/lib/python2.7/site-packages/tensorflow/python/layers/normalization.py", line 395, in call
return self._fused_batch_norm(inputs, training=training)
File "
/lib/python2.7/site-packages/tensorflow/python/layers/normalization.py", line 302, in _fused_batch_norm
training, _fused_batch_norm_training, _fused_batch_norm_inference)
File "/lib/python2.7/site-packages/tensorflow/python/layers/utils.py", line 208, in smart_cond
return fn2()
File "
/lib/python2.7/site-packages/tensorflow/python/layers/normalization.py", line 299, in _fused_batch_norm_inference
data_format=self._data_format)
File "/lib/python2.7/site-packages/tensorflow/python/ops/nn_impl.py", line 831, in fused_batch_norm
name=name)
File "
/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 2034, in _fused_batch_norm
is_training=is_training, name=name)
File "/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "
/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "~/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1470, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1,1024,51,38]
[[Node: FirstStageFeatureExtractor/resnet_v1_101/resnet_v1_101/block3/unit_12/bottleneck_v1/conv3/BatchNorm/FusedBatchNorm = FusedBatchNorm[T=DT_FLOAT, data_format="NHWC", epsilon=1.001e-05, is_training=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](FirstStageFeatureExtractor/resnet_v1_101/resnet_v1_101/block3/unit_12/bottleneck_v1/conv3/Conv2D, FirstStageFeatureExtractor/resnet_v1_101/block3/unit_12/bottleneck_v1/conv3/BatchNorm/gamma/read/_683, FirstStageFeatureExtractor/resnet_v1_101/block3/unit_12/bottleneck_v1/conv3/BatchNorm/beta/read/_685, FirstStageFeatureExtractor/resnet_v1_101/block3/unit_12/bottleneck_v1/conv3/BatchNorm/moving_mean/read/_687, FirstStageFeatureExtractor/resnet_v1_101/block3/unit_12/bottleneck_v1/conv3/BatchNorm/moving_variance/read/_689)]]
[[Node: Reshape_24/_1255 = _HostRecvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4073_Reshape_24", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Traceback (most recent call last):
File "object_detection/train.py", line 198, in
tf.app.run()
File "/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "object_detection/train.py", line 194, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "
/Documents/Custom-Object-Detection-master/object_detection/trainer.py", line 296, in train
saver=saver)
File "/lib/python2.7/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 775, in train
sv.stop(threads, close_summary_writer=True)
File "
/lib/python2.7/contextlib.py", line 35, in exit
self.gen.throw(type, value, traceback)
File "/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 964, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "
/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 792, in stop
stop_grace_period_secs=self._stop_grace_secs)
File "/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "
/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 296, in stop_on_exception
yield
File "/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 494, in run
self.run_loop()
File "
/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 994, in run_loop
self._sv.global_step])
File "/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "
/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
options, run_metadata)
File "
/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,1024,51,38]
[[Node: FirstStageFeatureExtractor/resnet_v1_101/resnet_v1_101/block3/unit_12/bottleneck_v1/conv3/BatchNorm/FusedBatchNorm = FusedBatchNorm[T=DT_FLOAT, data_format="NHWC", epsilon=1.001e-05, is_training=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](FirstStageFeatureExtractor/resnet_v1_101/resnet_v1_101/block3/unit_12/bottleneck_v1/conv3/Conv2D, FirstStageFeatureExtractor/resnet_v1_101/block3/unit_12/bottleneck_v1/conv3/BatchNorm/gamma/read/_683, FirstStageFeatureExtractor/resnet_v1_101/block3/unit_12/bottleneck_v1/conv3/BatchNorm/beta/read/_685, FirstStageFeatureExtractor/resnet_v1_101/block3/unit_12/bottleneck_v1/conv3/BatchNorm/moving_mean/read/_687, FirstStageFeatureExtractor/resnet_v1_101/block3/unit_12/bottleneck_v1/conv3/BatchNorm/moving_variance/read/_689)]]
[[Node: Reshape_24/_1255 = _HostRecvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4073_Reshape_24", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Caused by op u'FirstStageFeatureExtractor/resnet_v1_101/resnet_v1_101/block3/unit_12/bottleneck_v1/conv3/BatchNorm/FusedBatchNorm', defined at:
File "object_detection/train.py", line 198, in
tf.app.run()
File "/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "object_detection/train.py", line 194, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "
/Documents/Custom-Object-Detection-master/object_detection/trainer.py", line 192, in train
clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue])
File "/Documents/Custom-Object-Detection-master/slim/deployment/model_deploy.py", line 193, in create_clones
outputs = model_fn(*args, **kwargs)
File "
/Documents/Custom-Object-Detection-master/object_detection/trainer.py", line 131, in _create_losses
prediction_dict = detection_model.predict(images)
File "/Documents/Custom-Object-Detection-master/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 513, in predict
image_shape) = self._extract_rpn_feature_maps(preprocessed_inputs)
File "
/Documents/Custom-Object-Detection-master/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 652, in _extract_rpn_feature_maps
preprocessed_inputs, scope=self.first_stage_feature_extractor_scope)
File "/Documents/Custom-Object-Detection-master/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 131, in extract_proposal_features
return self._extract_proposal_features(preprocessed_inputs, scope)
File "
/Documents/Custom-Object-Detection-master/object_detection/models/faster_rcnn_resnet_v1_feature_extractor.py", line 126, in _extract_proposal_features
scope=var_scope)
File "/Documents/Custom-Object-Detection-master/slim/nets/resnet_v1.py", line 298, in resnet_v1_101
reuse=reuse, scope=scope)
File "
/Documents/Custom-Object-Detection-master/slim/nets/resnet_v1.py", line 216, in resnet_v1
net = resnet_utils.stack_blocks_dense(net, blocks, output_stride)
File "/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "
/Documents/Custom-Object-Detection-master/slim/nets/resnet_utils.py", line 185, in stack_blocks_dense
net = block.unit_fn(net, rate=rate, **dict(unit, stride=1))
File "/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "
/Documents/Custom-Object-Detection-master/slim/nets/resnet_v1.py", line 118, in bottleneck
activation_fn=None, scope='conv3')
File "/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "
/lib/python2.7/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1042, in convolution
outputs = normalizer_fn(outputs, **normalizer_params)
File "/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "
/lib/python2.7/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 643, in batch_norm
outputs = layer.apply(inputs, training=is_training)
File "/lib/python2.7/site-packages/tensorflow/python/layers/base.py", line 671, in apply
return self.call(inputs, *args, **kwargs)
File "
/lib/python2.7/site-packages/tensorflow/python/layers/base.py", line 575, in call
outputs = self.call(inputs, *args, **kwargs)
File "/lib/python2.7/site-packages/tensorflow/python/layers/normalization.py", line 395, in call
return self._fused_batch_norm(inputs, training=training)
File "
/lib/python2.7/site-packages/tensorflow/python/layers/normalization.py", line 302, in _fused_batch_norm
training, _fused_batch_norm_training, _fused_batch_norm_inference)
File "/lib/python2.7/site-packages/tensorflow/python/layers/utils.py", line 208, in smart_cond
return fn2()
File "
/lib/python2.7/site-packages/tensorflow/python/layers/normalization.py", line 299, in _fused_batch_norm_inference
data_format=self._data_format)
File "/lib/python2.7/site-packages/tensorflow/python/ops/nn_impl.py", line 831, in fused_batch_norm
name=name)
File "
/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 2034, in _fused_batch_norm
is_training=is_training, name=name)
File "/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "
/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "~/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1470, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1,1024,51,38]
[[Node: FirstStageFeatureExtractor/resnet_v1_101/resnet_v1_101/block3/unit_12/bottleneck_v1/conv3/BatchNorm/FusedBatchNorm = FusedBatchNorm[T=DT_FLOAT, data_format="NHWC", epsilon=1.001e-05, is_training=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](FirstStageFeatureExtractor/resnet_v1_101/resnet_v1_101/block3/unit_12/bottleneck_v1/conv3/Conv2D, FirstStageFeatureExtractor/resnet_v1_101/block3/unit_12/bottleneck_v1/conv3/BatchNorm/gamma/read/_683, FirstStageFeatureExtractor/resnet_v1_101/block3/unit_12/bottleneck_v1/conv3/BatchNorm/beta/read/_685, FirstStageFeatureExtractor/resnet_v1_101/block3/unit_12/bottleneck_v1/conv3/BatchNorm/moving_mean/read/_687, FirstStageFeatureExtractor/resnet_v1_101/block3/unit_12/bottleneck_v1/conv3/BatchNorm/moving_variance/read/_689)]]
[[Node: Reshape_24/_1255 = _HostRecvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4073_Reshape_24", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

What might cause error, Why this error?

Whenever I saw the "ResourceExhaustedError" I restarted my computer and it helped. Usually the memory is full and the computer cannot function anymore. Restarting is helping with the memory. Try that.