Belval / CRNN

A TensorFlow implementation of https://github.com/bgshih/crnn

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Looking forward to your reply

gds101054108 opened this issue · comments

logits = tf.transpose(logits, (1, 0, 2)) why? the original order is [batch time class]
self.__seq_len: [self.__max_char_count] * self.__data_manager.batch_size why? I think seq should be a varient number equal every single target seq length

That's the ordering thattf.nn.ctc_loss needs.

Looking at the documentation, you can use time_major=False and skip the transposition.

See: https://www.tensorflow.org/api_docs/python/tf/nn/ctc_loss

As for the second question I'm not quite sure anymore, I think it had to do with variable width between fonts. Sometimes, fonts are wider than the windows created by the LSTM.

Usually in RNNs, you indeed pass the length of your sequence.

thank you,I have used https://github.com/gds101054108/keras/blob/master/examples/image_ocr.py trained my chinese-ocr model, I have generate 16M variant width image,the height is 48 pixel,Contains gbk totally 21025 characters. it converage to 99.0%,and it works quite well. but I have to use K.clear_session() to kill the session and reload the weight when the batch of image width changed,so it takes one month to trained. I want to use variant rnn to speed up training.Hope you can help me solve this problem.

@Belval I have did some experiment,I think the seq_len is the real image width w//4-1(after CNN) before padding

@gds101054108 Bingo, I get the same result, seq_len is w//4-1.
Besides, I got the method to let seq_len be w//4. In last conv that using valid padding, replacing [2, 1] kernel with [2, 2] kernel make the width not decrease, got the same result with author of the paper.

@wangershi do you mind elaborating? The last convolutional layer (conv7) in crnn.py does use a (2, 2) kernel.

Also, how did you test that your results were the same as the original paper?

@Belval
First question:
In original paper, Section 3.2:
For example, an image containing 10 characters is typically of size 100*32, from which a feature sequence 25 frames can be generated.
I just want to recurrence the code, so this is a view of coding:
Beacause the feature width is same before and after conv7, so the stride along horizontal direction is 1, and the padding of conv7 is valid, so if the kenel size along horizontal direction is 2, the width will decrese 1 .it's not the same with the original paper. So the kenel size along horizontal direction is 1.
Actually, crnn is writen by lua, I have difficulty in reading it, so I don't know how author solve this(in paper, the kenel size along horizontal direction is 2).

Second question:
In crnn.py, if you replaced code
self.__seq_len: [self.__max_char_count] * self.__data_manager.batch_size
with
self.__seq_len: [26] * self.__data_manager.batch_size
TensorFlow will raise a Exception:
InvalidArgumentError (see above for traceback): sequence_length(0) <= 25
So, I got how is the seq_len is w//4.
It's a pitty that the ctc is confusing, I can't anwser it using ctc algorithm.

@wangershi Thank you for taking the time to explain. I'll make the modifications and retrain to see if it yields better results.