watsonyanghx / CNN_LSTM_CTC_Tensorflow

CNN+LSTM+CTC based OCR implemented using tensorflow.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Some problems aboult this code

980044579 opened this issue · comments

Infact the "max_stepsize" in this code should't be 64.The "max_stepsize" is equal to 12,which is shrunk from original "image_width"(180) to 180/2/2/2/2 = 12.Remenber the core idea in CRNN+CTC is that we split the image vertically to many slices,and we predict each slice's classes,finally using CTC to decode the predicted sequence to the respectd result.For example "aaa_bb_c_"and "a__b_ccc" both respect to the same label "abc",you can also read the paper for more details.

But when I run the wrong code in author's dataset,and I got 98% accuracy while I got a bad result in VGGWord dataset.Finally I got a good result after changing the code.

So, why this code work in your situation,I am very courious about this.Thank you.

@980044579 , thanks for sharing your observations and experience.

  1. With the great source codes in this project and the data provided, I was able to reproduce the author's result, getting 0.997 at 50th epoch.
  2. I agree with you on the max_stepsize. it should be in the direction of "image_width", 12 in this project. I also plan to correct this and see how it might impact the final result., If it's okay, can you share your code changes in this area?

Just change the code between CNN -> RNN in cnn_lstm_otc_ocr.py, make sure the shape of the input of RNN is [batch_size, max_stepsize, num_features].

Hi @980044579 , thanks a lot for your kind reply. I did the code changes too in yesterday and found the model can achieve 0.999 accuracy at 12th epoch. so the model is able to converge faster and achieve better performance after fixing this bug.

For those who are interested, here is my code changes.

Good job~

I am getting and error Failed precondition: sequence_length(0) <= 12

What I did for inference is I have already trained the model to

model_checkpoint_path: "ocr-model-21001"
all_model_checkpoint_paths: "ocr-model-21001"

on a set of 80000 train and 20 val images a provided in the dataset. I took a few images from val set and create a folder infer(40imgs named 1.png .. 40.png). I tried to run the code for inference using the command given in the readme.

INFO:tensorflow:Restoring parameters from ./checkpoint/ocr-model-20001
restore from ckpt./checkpoint/ocr-model-20001
2018-01-23 11:16:17.305360: W tensorflow/core/framework/op_kernel.cc:1192] Failed precondition: sequence_length(0) <= 12
Traceback (most recent call last):
File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1323, in _do_call
return fn(*args)
File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1302, in _run_fn
status, run_metadata)
File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.FailedPreconditionError: sequence_length(0) <= 12
[[Node: CTCBeamSearchDecoder = CTCBeamSearchDecoder[beam_width=100, merge_repeated=false, top_paths=1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](lstm/transpose_2, _arg_lstm/Fill_0_1)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "./main.py", line 184, in
tf.app.run()
File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "./main.py", line 179, in main
infer(FLAGS.infer_dir, FLAGS.mode)
File "./main.py", line 155, in infer
dense_decoded_code = sess.run(model.dense_decoded, feed)
File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
options, run_metadata)
File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.FailedPreconditionError: sequence_length(0) <= 12
[[Node: CTCBeamSearchDecoder = CTCBeamSearchDecoder[beam_width=100, merge_repeated=false, top_paths=1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](lstm/transpose_2, _arg_lstm/Fill_0_1)]]

Caused by op 'CTCBeamSearchDecoder', defined at:
File "./main.py", line 184, in
tf.app.run()
File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "./main.py", line 179, in main
infer(FLAGS.infer_dir, FLAGS.mode)
File "./main.py", line 115, in infer
model.build_graph()
File "/home/anubhav/Downloads/Manish Sir/CNN_LSTM_CTC_Tensorflow-master (2)/cnn_lstm_otc_ocr.py", line 24, in build_graph
self._build_train_op()
File "/home/anubhav/Downloads/Manish Sir/CNN_LSTM_CTC_Tensorflow-master (2)/cnn_lstm_otc_ocr.py", line 158, in _build_train_op
merge_repeated=False)
File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/ops/ctc_ops.py", line 269, in ctc_beam_search_decoder
merge_repeated=merge_repeated))
File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/ops/gen_ctc_ops.py", line 76, in _ctc_beam_search_decoder
top_paths=top_paths, merge_repeated=merge_repeated, name=name)
File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1470, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

FailedPreconditionError (see above for traceback): sequence_length(0) <= 12
[[Node: CTCBeamSearchDecoder = CTCBeamSearchDecoder[beam_width=100, merge_repeated=false, top_paths=1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](lstm/transpose_2, _arg_lstm/Fill_0_1)]]

@anubhavrohatgi make sure the maxlength of label in your dataset must <= max_stepsize

@980044579 Please brief me a bit, quiet new to this stuff in Python. what maxlength of label is.

Currently I am using the dataset that was provided in the link given in the repo. Max_stepsize = 64, i guess as is stated in utils.py

All images are 180x60.

error occurs somewhere here:
dense_decoded_code = sess.run(model.dense_decoded, feed)

below is my infer folder contents
screen2

are you talking about the labels.txt?

Correct me if I am wrong here:: by infer we mean we are testing on our real time data. is it.
If not please help me, how can I use the model to predict the values of a given input image.

@anubhavrohatgi @980044579 ,hello, i run into the same question,but i inspect the label and find the max length of label is not greater than maxT in[maxT,batch_size,num_char],have you solve it? i don't konw how to do it

@anubhavrohatgi @kstys make sure you understand how the framework "CNN + RNN + CTC" work and there are some bugs in this code.You should not only change the "maxsteps" in utils.py but also the code between CNN ——> RNN in cnn_lstm_otc_ocr.py

I have a question. in the file of cnn_letm_otc_ocr.oy , after cnn, the x.set_shape([FLAGS.batch_size, filters[3], 24]) is right? the time sequence should be the width which will be feed to the LSTM, but the code is the length of channels.

I changed the code as @LevinJ ,but i got a error "tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found."

I set the max_step as 128 and my input image is 32*192