githubharald / CTCWordBeamSearch

Connectionist Temporal Classification (CTC) decoder with dictionary and language model.

Home Page:https://harald-scheidl.medium.com/b051d28f3d2e

Repository from Github https://github.comgithubharald/CTCWordBeamSearchRepository from Github https://github.comgithubharald/CTCWordBeamSearch

question about feedMat

hxk11111 opened this issue · comments

Hi @githubharald , thanks for you project. I have some question about the mat fed into the tf session. I am training crnn+ctc model. For example, for an image which represents for text "x181208022". Before ctc layer, I have the rnn output, if I use greedy decoding, I will get the result as "--x-11-8-1-2-0-8--0-2-2---", "-" represents for the ctc-blank. If I want to use your project, should I just feed the rnn output matrix into word beam search part?
Because I saw your testing code:

blank = len(chars)
s = ''
batch = 0
for label in res[batch]:
	if label == blank:
		break
	s += chars[label] 

The for loop will break if met a ctc-blank. But in my case, ctc-blank is not the end of a word, if break it will give the wrong result

Hi,

the blank is only used to indicate the end of the resulting string of the CTC decoder (if it is shorter than the output of the RNN layers).
So, it would e.g. return "Hello-----", where only the string before the first blank is relevant.

P.S.: the output of your greedy decoder should not contain blanks between characters. Seems that it only applies step (1) of greedy decoding: computing the list of characters with highest score along the x-axis of the image (more details see "Best path decoding" in this article).

Many Thanks