Dev re-make
kudkudak opened this issue · comments
Stanislaw Jastrzebski commented
Simple re-make of dev. I am not sure if it will solve much, but we should propose and re-evaluate on something like this anyway.
- Random dev/dev2/test instead of top scores
- Bucketing of dev/dev2/test, or some AUC measure
See how ordering of factorized/prototypical/DNN is affected.
Tasks:
-
Script performing splitting
-
Script adding negative samples
-
Metrics on buckets