Batch normalization --training parameter
galinator9000 opened this issue · comments
Hi, I wanted to use YOLOv3-tiny model. Downloaded cfg and weights from official website.
With this code below i successfully built .pb and .meta files.
python main.py --cfg ../yolov3-tiny/yolov3-tiny.cfg --weights ../yolov3-tiny/yolov3-tiny.weights --output ../yolov3-tiny/ --prefix "YOLO/"
With this script below I could load graph and weights.
Tried to get output from last convolutional13 layer, I got array with full of nan values:
import tensorflow as tf
import numpy as np
import cv2
saver = tf.train.import_meta_graph("yolov3-tiny/yolov3-tiny.meta")
sess = tf.Session()
saver.restore(sess, "yolov3-tiny/yolov3-tiny.ckpt")
image = cv2.cvtColor(cv2.imread("sample.jpg"), cv2.COLOR_BGR2RGB) / 255.0
image = np.expand_dims(image, axis=0)
print(
sess.run("YOLO/convolutional13/BiasAdd:0", feed_dict={"YOLO/net1:0":image})
)
Outputs:
[[[[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]
...
[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]]
[[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]
...
[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]]]]
However when i tried same conversion with
python main.py --training --cfg ../yolov3-tiny/yolov3-tiny.cfg --weights ../yolov3-tiny/yolov3-tiny.weights --output ../yolov3-tiny/ --prefix "YOLO/
Same script outputs:
[[[[-0.5312634 0.23449755 -0.22042923 ... -0.99058443 -0.75764066
0.05638865]
[-0.1264087 -0.06148954 -0.13978335 ... -0.57391363 -0.65091616
-0.34988856]
[-0.27005857 0.18064664 -0.1842366 ... -0.7720764 -0.63676864
-0.22235665]
...
[-0.14108022 0.12593661 0.040429 ... -0.51453155 -0.8112872
-0.2482701 ]
[-0.14169356 0.05826963 0.04545707 ... -0.36210614 -0.6568373
-0.17424914]
[-0.24074644 0.49974358 -0.17072684 ... -1.1237179 -0.8400626
-0.20994306]]
[[-0.37883073 0.06569445 0.07646853 ... -0.72665095 -0.5669313
0.23495841]
[-0.11390454 0.00512573 0.09839267 ... 0.02260823 -0.31830767
0.00776402]
[-0.18927872 0.14090516 0.06336813 ... -0.17192174 -0.3423958
0.07134365]
...
[-0.5374908 0.17205149 0.30092606 ... -1.299513 -0.50735444
-0.45372528]
[-0.44234592 0.17717186 0.11988509 ... -0.9887123 -0.25854525
-0.40106654]
[-0.30651295 0.32414198 0.01627261 ... -1.7556211 -0.55981153
-0.5505434 ]]]]
I believe this is because batch-normalization, --training parameter. And I want to use this model for transfer learning.
Also when I tried to get output from earlier layers like convolutional2 (without --training parameter), values were like:
[[[[nan -1.4262159e+36 -1.6400952e+36 ... -1.5521092e+36
1.1826908e+38 -1.1971094e+37]
[ nan -5.4608188e+36 -inf ... -2.9475174e+35
-2.9942158e+36 -inf]
[ nan -5.4608188e+36 -inf ... -2.9475174e+35
-2.9942158e+36 -inf]
...
[ nan -5.4608188e+36 -inf ... -2.9475174e+35
-2.9942158e+36 -inf]
[ nan -5.4608188e+36 -inf ... -2.9475174e+35
-2.9942158e+36 -inf]
[ nan -4.9901782e+36 -2.4481979e+36 ... 8.4210530e+36
-inf -1.1353102e+37]]
[[ nan -1.3676106e+36 inf ... 1.5158864e+37
inf -8.5954786e+36]
[ nan -7.9527132e+36 inf ... 2.1685821e+37
1.6828479e+37 -inf]
[ nan -7.9527132e+36 inf ... 2.1685821e+37
1.6828479e+37 -inf]
...
[ nan -3.1938362e+36 inf ... 1.5331453e+37
3.3975579e+37 -9.5892951e+36]
[ nan -3.1938362e+36 inf ... 1.5331453e+37
3.3975579e+37 -9.5892951e+36]
[ nan -5.6393693e+36 4.6983167e+37 ... 1.0347686e+37
-5.8164126e+36 -4.1906564e+36]]]]
Is this a problem about code or am I missing something about like image input?
@fmehmetun Thanks for reporting this. After a little digging, this seems to be due to different weight offsets (16 vs 20) for different major/minor versions. So, yolov2-tiny, yolov3-tiny and yolov3 seem to require an offset of 20 instead of 16. If not set properly, this can corrupt the converted TF weights (ckpt), which likely caused the nan
s you reported.
Fortunately someone fixed this for darkflow in this PR. From a quick test, it seems to resolve your issue. I'll run some more tests and push the fix shortly.
@fmehmetun - give it a try and let me know if you see any other issues.
Thanks for the fix. I tried now and its working with no problem. After opening issue I tried darkflow though, it's worked with no problem too. It's good to know I have another option for conversion. Thanks.