Kyubyong / bert_ner

Ner with Bert

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

F1, recall and precision calculation

rahul-1996 opened this issue · comments

Hi,
I was wondering how you are actually calculating your scores.

y_true = np.array([hp.tag2idx[line.split()[1]] for line in open(f, 'r').read().splitlines() if len(line) > 0])
y_pred = np.array([hp.tag2idx[line.split()[2]] for line in open(f, 'r').read().splitlines() if len(line) > 0])

num_proposed = len(y_pred[y_pred>1])
num_correct = (np.logical_and(y_true==y_pred, y_true>1)).astype(np.int).sum()
num_gold = len(y_true[y_true>1])

precision = num_correct / num_proposed
recall = num_correct / num_gold

Can you explain what the above code means?
How does this translate to say recall = TP / TP + FN? Don't you have to use some multi-class method?

Also, why are you only taking the index where y_true>1? Is it because you do not want the Other tag to skew your results? Thanks!

I think this is a kind of raw since it counts event kind of tag (including I-XXXX) into the computation.

Using the standard evaluation tool is more preferable, such as https://github.com/sighsmile/conlleval/blob/master/conlleval.py.