githubharald / CTCWordBeamSearch

Connectionist Temporal Classification (CTC) decoder with dictionary and language model.

Home Page:https://harald-scheidl.medium.com/b051d28f3d2e

Repository from Github https://github.comgithubharald/CTCWordBeamSearchRepository from Github https://github.comgithubharald/CTCWordBeamSearch

boolean value of tensor with more than one value is ambiguous.

biscayan opened this issue · comments

  1. Which program causes the problem
  • Python prototype
  1. Versions
  • Python version 3.7.7
  • Operating system ubuntu 18.04
  • Pytorch version 1.6.0
  1. Issue
    Hi, I read your paper, and I thought it is such a good algorithm. Thus, I want to apply the word beam search to my research.
    However, it is not easy to implement with python project.
    I have a research about speech recognition. My input data (speech -> spectrogram) enters into the model, and it makes the output which has a shape of [sequence length (T) x batch size (B) x number of characters (C)]. e.g. (371, 32, 29)
    Then it is fed into the decoder.
def WordBeamSearch(mat, beamWidth, lm, useNGrams):
    "decode matrix, use given beam width and language model"
    chars = lm.getAllChars()
    blankIdx = len(chars)  # blank label is supposed to be last label in RNN output
    #mat = mat.cpu().numpy()
    print(mat.shape)
    maxT, _, _ = mat.shape  # shape of RNN output: TxBxC

    genesisBeam = Beam(lm, useNGrams)  # empty string
    last = BeamList()  # list of beams at time-step before beginning of RNN output
    last.addBeam(genesisBeam)  # start with genesis beam
    # go over all time-steps
    for t in range(maxT):
        curr = BeamList()  # list of beams at current time-step

        # go over best beams
        bestBeams = last.getBestBeams(beamWidth)  # get best beams
        .....

The error occurs when to get best beams
and error message 'boolean value of tensor with more than one value is ambiguous. ' is popped up
at here.

def getBestBeams(self, num):
"return best beams, specify the max. number of beams to be returned (beam width)"
        u = [v for (_, v) in self.beams.items()]
        lmWeight = 1
        return sorted(u, reverse=True, key=lambda x: x.getPrTotal() * (x.getPrTextual() ** lmWeight))[:num]

I changed the tensor into the numpy array, but it makes another error again.
'The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()'

I tried to find the solution and read your code for days, but I don't know what is problem.
Please help me. If you want to know more details about error, please let me know.
I look forward to your comment.
Thank you for your consideration.

Hi,

  1. can you give the function-stack-trace which Python prints when it crashes (just copy paste the output from the terminal)? I have to know the exact line where this happens.
  2. I never tried it with PyTorch Tensors, but for NumPy arrays it should work
  3. You're using the Python prototype. Is there a reason why you don't use the C++ implementation (which can also be used in Python code)? It is much faster and also provides more features.
  4. Is the CTC blank character the last one of the characters?

@githubharald
Thank you for your comment. I solve the problem by converting tensors into numpy arrays and adding .all() function to the array.

However, I'm curious about the output of the decoder.

As I printed the output, decoder makes just one sentence, but can I have output which is bound with batch size?
For example, decoder makes a list which has a length of batch size, so I can get sentences at once.

Second, does decoder output a sentence which is only in the language model (corpus)?
I made 'chars.txt' and 'wordchars.txt' with 28 characters which are space, ' and A-Z
and made 'corpus.txt' with some sentences.
It seems that decoder outputs a sentence which is only in the 'corpus.txt'.

Thanks in advance.

  1. The prototype only works on one batch element at a time. As I said - better use the C++ implementation.
  2. To use the language model you need a large corpus. You're only using a small corpus, the best is to disable the language model and just use the corpus to create a dictionary, which is called "Words" mode, which is enabled by setting useNGrams = False in main.py.

@githubharald
OK, I understand. Thank you for your explanation.
I will try to use C++ implementation.
If I have another question, I will open issue again.
Thank you.