Rostlab / nala

Text mining of natural language mutations mentions

Home Page:https://www.tagtog.net/-corpora/IDP4+

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

remove words only use stem

abojchevski opened this issue · comments

Baseline to compare to:

SUBCLASS    0   p:0.8453 r:0.8079 f:0.8262 strictness:exact
SUBCLASS    1   p:0.3072 r:0.3333 f:0.3197 strictness:exact
SUBCLASS    2   p:0.3684 r:0.3925 f:0.3801 strictness:exact
TOTAL           p:0.7463 r:0.7291 f:0.7376 strictness:exact

SUBCLASS    0   p:0.9368 r:0.8911 f:0.9134 strictness:overlapping
SUBCLASS    1   p:0.7543 r:0.7792 f:0.7665 strictness:overlapping
SUBCLASS    2   p:0.7153 r:0.7464 f:0.7305 strictness:overlapping
TOTAL           p:0.8940 r:0.8661 f:0.8798 strictness:overlapping

Including the change

SUBCLASS    0   p:0.8342 r:0.8159 f:0.8249 strictness:exact
SUBCLASS    1   p:0.3117 r:0.3392 f:0.3249 strictness:exact
SUBCLASS    2   p:0.3551 r:0.3551 f:0.3551 strictness:exact
TOTAL           p:0.7398 r:0.7345 f:0.7371 strictness:exact

SUBCLASS    0   p:0.9203 r:0.8907 f:0.9053 strictness:overlapping
SUBCLASS    1   p:0.7585 r:0.7854 f:0.7717 strictness:overlapping
SUBCLASS    2   p:0.7338 r:0.7338 f:0.7338 strictness:overlapping
TOTAL           p:0.8835 r:0.8659 f:0.8746 strictness:overlapping

stem lower

SUBCLASS    0   p:0.8371 r:0.8010 f:0.8187 strictness:exact
SUBCLASS    1   p:0.3192 r:0.3475 f:0.3328 strictness:exact
SUBCLASS    2   p:0.3596 r:0.3832 f:0.3710 strictness:exact
TOTAL           p:0.7407 r:0.7247 f:0.7326 strictness:exact

SUBCLASS    0   p:0.9390 r:0.8950 f:0.9165 strictness:overlapping
SUBCLASS    1   p:0.7646 r:0.7995 f:0.7816 strictness:overlapping
SUBCLASS    2   p:0.7103 r:0.7464 f:0.7279 strictness:overlapping
TOTAL           p:0.8971 r:0.8724 f:0.8846 strictness:overlapping

decided to go with stemmed and lowered words