Rostlab / nala

Text mining of natural language mutations mentions

Home Page:https://www.tagtog.net/-corpora/IDP4+

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Current Performance

abojchevski opened this issue · comments

  • Used base + all iterations up to and including iteration 51
  • 638 documents in total
  • Subclass distribution: Counter({0: 4148, 1: 740, 2: 217})
  • Stratified split 2/3 + 1/3 into: train: 397, test: 241

Performance:

SUBCLASS    0   p:0.8453 r:0.8079 f:0.8262 strictness:exact
SUBCLASS    1   p:0.3072 r:0.3333 f:0.3197 strictness:exact
SUBCLASS    2   p:0.3684 r:0.3925 f:0.3801 strictness:exact
TOTAL           p:0.7463 r:0.7291 f:0.7376 strictness:exact

SUBCLASS    0   p:0.9368 r:0.8911 f:0.9134 strictness:overlapping
SUBCLASS    1   p:0.7543 r:0.7792 f:0.7665 strictness:overlapping
SUBCLASS    2   p:0.7153 r:0.7464 f:0.7305 strictness:overlapping
TOTAL           p:0.8940 r:0.8661 f:0.8798 strictness:overlapping

Here are the top model features and transitions:
https://gist.github.com/abojchevski/5a251f6c08a3049aac2c

Including RegexNLFeatureGenerator (should be named deletion)

SUBCLASS    0   p:0.8433 r:0.8042 f:0.8233 strictness:exact
SUBCLASS    1   p:0.3010 r:0.3298 f:0.3147 strictness:exact
SUBCLASS    2   p:0.3860 r:0.4112 f:0.3982 strictness:exact
TOTAL           p:0.7439 r:0.7265 f:0.7351 strictness:exact

SUBCLASS    0   p:0.9362 r:0.8887 f:0.9118 strictness:overlapping
SUBCLASS    1   p:0.7603 r:0.7870 f:0.7734 strictness:overlapping
SUBCLASS    2   p:0.7292 r:0.7609 f:0.7447 strictness:overlapping
TOTAL           p:0.8950 r:0.8660 f:0.8803 strictness:overlapping

Training with Elastic Net (L1 + L2) regularization
And new post-processing rule

SUBCLASS    0   p:0.8318 r:0.8047 f:0.8180 strictness:exact
SUBCLASS    1   p:0.2809 r:0.3227 f:0.3003 strictness:exact
SUBCLASS    2   p:0.3670 r:0.3738 f:0.3704 strictness:exact
TOTAL           p:0.7297 r:0.7243 f:0.7270 strictness:exact

SUBCLASS    0   p:0.9301 r:0.8932 f:0.9113 strictness:overlapping
SUBCLASS    1   p:0.7626 r:0.8127 f:0.7868 strictness:overlapping
SUBCLASS    2   p:0.7569 r:0.7730 f:0.7649 strictness:overlapping
TOTAL           p:0.8915 r:0.8739 f:0.8826 strictness:overlapping

And old post-processing rule

SUBCLASS    0   p:0.8318 r:0.8047 f:0.8180 strictness:exact
SUBCLASS    1   p:0.2817 r:0.3227 f:0.3008 strictness:exact
SUBCLASS    2   p:0.3619 r:0.3551 f:0.3585 strictness:exact
TOTAL           p:0.7305 r:0.7234 f:0.7269 strictness:exact

SUBCLASS    0   p:0.9301 r:0.8932 f:0.9113 strictness:overlapping
SUBCLASS    1   p:0.7615 r:0.8098 f:0.7849 strictness:overlapping
SUBCLASS    2   p:0.7643 r:0.7589 f:0.7616 strictness:overlapping
TOTAL           p:0.8920 r:0.8727 f:0.8823 strictness:overlapping

Nothing to do here. We will indeed use Elastic Net