problem when running train.py

Question

problem when running train.py

dddson opened this issue 6 years ago · comments

Hey,
Im having this error when i run train.py, can u help me?

WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values)
WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values)
WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values)
WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values)
Traceback (most recent call last):
File "C:\Python36\lib\site-packages\tensorflow\python\framework\common_shapes.py", line 686, in _call_cpp_shape_fn_impl
input_tensors_as_shapes, status)
File "C:\Python36\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 473, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Negative dimension size caused by subtracting 3 from 1 for 'InceptionV3/InceptionV3/Conv2d_2a_3x3/Conv2D' (op: 'Conv2D') with input shapes: [227,1,1,32], [3,3,32,32].

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\davidson\Desktop\face\train.py", line 202, in
tf.app.run()
File "C:\Python36\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "C:\Users\davidson\Desktop\face\train.py", line 130, in main
logits = model_fn(md['nlabels'], images, 1-FLAGS.pdrop, True)
File "C:\Users\davidson\Desktop\face\model.py", line 89, in inception_v3
net, end_points = inception_v3_base(images, scope=scope)
File "C:\Python36\lib\site-packages\tensorflow\contrib\slim\python\slim\nets\inception_v3.py", line 117, in inception_v3_base
net = layers.conv2d(net, depth(32), [3, 3], scope=end_point)
File "C:\Python36\lib\site-packages\tensorflow\contrib\framework\python\ops\arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "C:\Python36\lib\site-packages\tensorflow\contrib\layers\python\layers\layers.py", line 1033, in convolution
outputs = layer.apply(inputs)
File "C:\Python36\lib\site-packages\tensorflow\python\layers\base.py", line 671, in apply
return self.call(inputs, *args, **kwargs)
File "C:\Python36\lib\site-packages\tensorflow\python\layers\base.py", line 575, in call
outputs = self.call(inputs, *args, **kwargs)
File "C:\Python36\lib\site-packages\tensorflow\python\layers\convolutional.py", line 167, in call
outputs = self._convolution_op(inputs, self.kernel)
File "C:\Python36\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 835, in call
return self.conv_op(inp, filter)
File "C:\Python36\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 499, in call
return self.call(inp, filter)
File "C:\Python36\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 187, in call
name=self.name)
File "C:\Python36\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 630, in conv2d
data_format=data_format, name=name)
File "C:\Python36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "C:\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 2958, in create_op
set_shapes_for_outputs(ret)
File "C:\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 2209, in set_shapes_for_outputs
shapes = shape_func(op)
File "C:\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 2159, in call_with_requiring
return call_cpp_shape_fn(op, require_shape_fn=True)
File "C:\Python36\lib\site-packages\tensorflow\python\framework\common_shapes.py", line 627, in call_cpp_shape_fn
require_shape_fn)
File "C:\Python36\lib\site-packages\tensorflow\python\framework\common_shapes.py", line 691, in _call_cpp_shape_fn_impl
raise ValueError(err.message)
ValueError: Negative dimension size caused by subtracting 3 from 1 for 'InceptionV3/InceptionV3/Conv2d_2a_3x3/Conv2D' (op: 'Conv2D') with input shapes: [227,1,1,32], [3,3,32,32].

Thank you! @dpressel

Daniel Pressel · Answer 1 · Fri May 04 2018 11:29:39 GMT+0800 (China Standard Time)

I am not sure how you are using it, but its working for me as documented in the README.md (I am running TF 1.5):

python train.py --train_dir $ python train.py --train_dir ~/dev/work/AgeGenderDeepLearning/Folds/tf/age_test_fold_is_0 --max_steps 15000 --model_type inception --batch_size 32 --eta 0.001 --dropout 0.5 --pre_model /data/pre-trained/inception_v3.ckpt --max_steps 15000 --model_type inception --batch_size 32 --eta 0.001 --dropout 0.5 --pre_model /data/pre-trained/inception_v3.ckpt

Did you run preproc on your images as documented?

dddson · Answer 2 · Sat May 05 2018 01:21:12 GMT+0800 (China Standard Time)

I downgraded to your version of TF just now, but it's still showing the same error.
I basically adapted my convert_to_tf to have the same output as you but im using IMDB and WIKI instead of Adience.

this is my json file for age:
{"num_valid_shards": 4,
"num_train_shards": 20,
"valid_counts": 116,
"train_counts": 115456,
"timestamp": "2018-05-02 03:15:49.088705",
"nlabels": 100}

and this is the record output witch is equal to yours:
features {
feature {
key: "image/class/label"
value {
int64_list {
value: 54
}
}
}
feature {
key: "image/encoded"
value {
bytes_list {
value: "\377\330\377\340\000\020.................\273r\305\247~U\265\225\357\177\231\377\331"
}
}
}
feature {
key: "image/filename"
value {
bytes_list {
value: "1839578_1955-12-16_2010.jpg"
}
}
}
feature {
key: "image/height"
value {
int64_list {
value: 256
}
}
}
feature {
key: "image/width"
value {
int64_list {
value: 256
}
}
}
}

Daniel Pressel · Answer 3 · Mon May 07 2018 21:56:17 GMT+0800 (China Standard Time)

At first glance, it looks like the bands from your exporter might be messed up (ie not in the order the trainer expects), but I can try and replicate this. It might take me a while as I am very busy, but it seems like it should not be too hard. LMK if there are any details I will need to recreate

明荣 · Answer 4 · Tue May 21 2019 14:05:20 GMT+0800 (China Standard Time)

I also encountered the same problem, please tell me how to solve it.

Daniel Pressel · Answer 5 · Tue May 21 2019 19:25:34 GMT+0800 (China Standard Time)

This appears to be an issue with the dataset. It looks like by the time it hits the trainer it has only a single channel but the trainer is expected 3-band data. Please check the input data carefully and make sure you are passing the right thing.

Also I’m open to merging support for this dataset if somebody gets it running and sends a PR