glample / tagger

Named Entity Recognition Tool

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Confidence score for the predicted entity

ishita1995 opened this issue · comments

I am not able to find the way to find the confidence score of the entity predicted by the model. Is there a way to calculate the confidence score?

The LSTM-CRF model predicts a sequence of tags for the entire sequence. As a result, there is not a real notion of entity score, but only sequence score (and the model returns the sequence with the best score). However, you can do something like taking the average of the LSTM probability scores in your entity, and this should give you a good proxy for a confidence score. For instance, if you have
"Barack Obama" in your sentence, and that the model tags these two words as "B_PER" and "E_PER" then you can report the average (or the product) of P(B_PER|Barack) and P(E_PER|Obama) given by the model.

I understand what you are saying above,
I am quite new to Theano. I am sorry if I am wrong
While calling the forward function in the f_eval alpha variable would return the probability, but when I make return_best_sequence as False, the code breaks and gives the following error-

File "tagger.py", line 49, in classify_ner
    y_preds = np.array(f_eval(*input))[1:-1]
IndexError: too many indices for array

What you want to look at is probably the tag probability scores:

tagger/model.py

Line 278 in c735605

tags_scores = final_layer.link(final_output)

Thanks a lot , it worked ..