question about feedMat

Question

question about feedMat

hxk11111 opened this issue 6 years ago · comments

Hi @githubharald , thanks for you project. I have some question about the mat fed into the tf session. I am training crnn+ctc model. For example, for an image which represents for text "x181208022". Before ctc layer, I have the rnn output, if I use greedy decoding, I will get the result as "--x-11-8-1-2-0-8--0-2-2---", "-" represents for the ctc-blank. If I want to use your project, should I just feed the rnn output matrix into word beam search part?
Because I saw your testing code:

blank = len(chars)
s = ''
batch = 0
for label in res[batch]:
	if label == blank:
		break
	s += chars[label]

The for loop will break if met a ctc-blank. But in my case, ctc-blank is not the end of a word, if break it will give the wrong result

Harald Scheidl · Answer 1 · Thu Aug 01 2019 16:50:40 GMT+0800 (China Standard Time)

Hi,

the blank is only used to indicate the end of the resulting string of the CTC decoder (if it is shorter than the output of the RNN layers).
So, it would e.g. return "Hello-----", where only the string before the first blank is relevant.

P.S.: the output of your greedy decoder should not contain blanks between characters. Seems that it only applies step (1) of greedy decoding: computing the list of characters with highest score along the x-axis of the image (more details see "Best path decoding" in this article).

hxk11111 · Answer 2 · Fri Aug 02 2019 15:52:59 GMT+0800 (China Standard Time)

Many Thanks