Looking forward to your reply

Question

Looking forward to your reply

gds101054108 opened this issue 6 years ago · comments

logits = tf.transpose(logits, (1, 0, 2)) why? the original order is [batch time class]
self.__seq_len: [self.__max_char_count] * self.__data_manager.batch_size why? I think seq should be a varient number equal every single target seq length

Edouard Belval · Answer 1 · Wed Jun 06 2018 19:24:29 GMT+0800 (China Standard Time)

That's the ordering thattf.nn.ctc_loss needs.

Looking at the documentation, you can use time_major=False and skip the transposition.

See: https://www.tensorflow.org/api_docs/python/tf/nn/ctc_loss

As for the second question I'm not quite sure anymore, I think it had to do with variable width between fonts. Sometimes, fonts are wider than the windows created by the LSTM.

Usually in RNNs, you indeed pass the length of your sequence.

gaodashuai · Answer 2 · Wed Jun 06 2018 21:22:20 GMT+0800 (China Standard Time)

thank you,I have used https://github.com/gds101054108/keras/blob/master/examples/image_ocr.py trained my chinese-ocr model, I have generate 16M variant width image,the height is 48 pixel,Contains gbk totally 21025 characters. it converage to 99.0%,and it works quite well. but I have to use K.clear_session() to kill the session and reload the weight when the batch of image width changed,so it takes one month to trained. I want to use variant rnn to speed up training.Hope you can help me solve this problem.

gaodashuai · Answer 3 · Wed Jun 06 2018 23:41:29 GMT+0800 (China Standard Time)

@Belval I have did some experiment,I think the seq_len is the real image width w//4-1(after CNN) before padding

Dao Zhang · Answer 4 · Tue Jun 19 2018 20:08:04 GMT+0800 (China Standard Time)

@gds101054108 Bingo, I get the same result, seq_len is w//4-1.
Besides, I got the method to let seq_len be w//4. In last conv that using valid padding, replacing [2, 1] kernel with [2, 2] kernel make the width not decrease, got the same result with author of the paper.

Edouard Belval · Answer 5 · Wed Jun 20 2018 00:32:46 GMT+0800 (China Standard Time)

@wangershi do you mind elaborating? The last convolutional layer (conv7) in crnn.py does use a (2, 2) kernel.

Also, how did you test that your results were the same as the original paper?

Dao Zhang · Answer 6 · Wed Jun 20 2018 10:53:14 GMT+0800 (China Standard Time)

@Belval
First question:
In original paper, Section 3.2:
For example, an image containing 10 characters is typically of size 100*32, from which a feature sequence 25 frames can be generated.
I just want to recurrence the code, so this is a view of coding:
Beacause the feature width is same before and after conv7, so the stride along horizontal direction is 1, and the padding of conv7 is valid, so if the kenel size along horizontal direction is 2, the width will decrese 1 .it's not the same with the original paper. So the kenel size along horizontal direction is 1.
Actually, crnn is writen by lua, I have difficulty in reading it, so I don't know how author solve this(in paper, the kenel size along horizontal direction is 2).

Second question:
In crnn.py, if you replaced code
self.__seq_len: [self.__max_char_count] * self.__data_manager.batch_size
with
self.__seq_len: [26] * self.__data_manager.batch_size
TensorFlow will raise a Exception:
InvalidArgumentError (see above for traceback): sequence_length(0) <= 25
So, I got how is the seq_len is w//4.
It's a pitty that the ctc is confusing, I can't anwser it using ctc algorithm.

Edouard Belval · Answer 7 · Wed Jun 20 2018 18:19:39 GMT+0800 (China Standard Time)

@wangershi Thank you for taking the time to explain. I'll make the modifications and retrain to see if it yields better results.