loomchild / segment

Program used to split text into segments

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Measure Precision and Recall

loomchild opened this issue · comments

Tokenizing

BEGIN {FS="[ <>]"}
{for (i=1; i<=NF; i++)
print $i
}

Comparison:

BEGIN {
truefile="frek-tok.txt"
while ((getline < truefile) > 0){
if ($0~//sentence$/)
brks++
}
print "True positives in original: " brks
}
/^-/sentence/ {fneg++}
/^+/sentence/ {fpos++}
END {
print "Precision: " brks / (brks + fpos)
print "Recall: " brks / (brks + fneg)
print "Accuracy: " brks / (brks + fpos +fneg)
}

Hi! This sentence splitter has already been evaluated with other sentence splitters. Time, precision and recall are available: https://github.com/mbanon/benchmarks/tree/main/sentence_splitting

Just in case it helps :)

Thank you. I created this ticket a while ago and never had time to work on it:)