Issues with amie.rules.eval.*
falcaopetri opened this issue · comments
Hi, I'm migrating from AMIE+ to AMIE3, and just found some issues with AMIE3's amie.rules.eval.RuleHitsEvaluator
.
First issue
Although the casting at L94-L95 seems to succeed, I get the following during the .keySet()
at L97:
Exception in thread "main" java.lang.ClassCastException: class it.unimi.dsi.fastutil.ints.IntOpenHashSet cannot be cast to class it.unimi.dsi.fastutil.ints.Int2IntMap (it.unimi.dsi.fastutil.ints.IntOpenHashSet and it.unimi.dsi.fastutil.ints.Int2IntMap are in unnamed module of loader 'app')
at amie.rules.eval.RuleHitsEvaluator.main(RuleHitsEvaluator.java:97)
Debugging
Indeed, the casting seems to be wrong. The 2 vars binding
-case is generated by generateBindingsForTwoVariables
at
https://github.com/lajus/amie/blob/b0f84cc052089bdcc991881886a0e9c80b83e0bc/rules/src/main/java/amie/rules/eval/Predictor.java#L115-L122
, which ends up at
https://github.com/lajus/amie/blob/b0f84cc052089bdcc991881886a0e9c80b83e0bc/kb/src/main/java/amie/data/KB.java#L2542
, and clearly returns an Int2ObjectMap<IntSet>
.
Workaround
- Int2ObjectMap<Int2IntMap> twoVarsBindings =
- (Int2ObjectMap<Int2IntMap>)bindings;
+ Int2ObjectMap<IntSet> twoVarsBindings =
+ (Int2ObjectMap<IntSet>)bindings;
for(int value1: twoVarsBindings.keySet()){
- for(int value2: twoVarsBindings.get(value1).keySet()){
+ for(int value2: twoVarsBindings.get(value1)){
Second issue
After resolving the first issue, I got:
Exception in thread "main" java.lang.IllegalArgumentException: Variable ?s100 DO NOT MATCH "\?(_?)[a-z][0-9]{1,2,3}"
at amie.data.KB.parseVariable(KB.java:226)
at amie.data.KB.map(KB.java:272)
at amie.rules.Rule.fullyUnboundTriplePattern1(Rule.java:236)
at amie.rules.eval.Evaluator.evaluate(Evaluator.java:32)
at amie.rules.eval.RuleHitsEvaluator.main(RuleHitsEvaluator.java:115)
Debugging
The KB.parseVariable
tried to match the string ?s100
with:
https://github.com/lajus/amie/blob/b0f84cc052089bdcc991881886a0e9c80b83e0bc/kb/src/main/java/amie/data/KB.java#L77-L80
We got ?s100
because every triple evaluated by Evaluator
(i.e., every prediction) increments an static counter (with Rule.fullyUnboundTriplePattern1()
).
Previous versions of AMIE did not have these issues, since it just:
Workaround
private static final String VariableRegex = Pattern.quote(Character.toString(VariableSign))
- + "(_)?([a-z])([0-9])?([0-9])?";
+ + "(_)?([a-z])([0-9])?([0-9])?([0-9])?([0-9])?";
Solution
I'm not sure, but I guess Evaluator
does not require a new fullyUnboundTriplePattern1
. Can't we just use ?s
and ?o
everytime? The following works for me:
- int[] head = Rule.fullyUnboundTriplePattern1();
+ int[] head = new int[3];
+ head[0] = KB.map("?s");
head[1] = triple[1];
+ head[2] = KB.map("?o");
+
After fixing these two issues, I got the same results of AMIE+'s amie.rules.eval.RuleHitsEvaluator
.
Third issue
RuleHitsEvaluator
calculates the wrong number of hits for one-var rules.
Note how by L91, t
is either ?a, head[1], binding
or binding, head[1], ?b
.
https://github.com/lajus/amie/blob/b0f84cc052089bdcc991881886a0e9c80b83e0bc/rules/src/main/java/amie/rules/eval/RuleHitsEvaluator.java#L81-L92
This triple will then be used here:
https://github.com/lajus/amie/blob/b0f84cc052089bdcc991881886a0e9c80b83e0bc/rules/src/main/java/amie/rules/eval/Evaluator.java#L46
As we still have a variable in the triple, we will probably match with something in target
, and count the prediction as a hit.
The correct behavior would generate the triple binding, head[1], head[2]
or head[0], head[1], binding
.
Fourth issue
Predictor
raises NullPointerException
when we have rules with one-var head. Well, I guess it was never used/required before...
https://github.com/lajus/amie/blob/b0f84cc052089bdcc991881886a0e9c80b83e0bc/rules/src/main/java/amie/rules/eval/Predictor.java#L232-L235
Also, Predictor
has the same problem as in RuleHitsEvaluator
(related to Int2ObjectMap<Int2IntMap>
vs Int2ObjectMap<IntSet>
), and described before (#8 (comment)).
Fifth issue
Predictor
and Evaluator
should unmap the triples during output. E.g., at:
The problem is, e.g., that Predictor
generates the input for Evaluator
, which will call
https://github.com/lajus/amie/blob/b0f84cc052089bdcc991881886a0e9c80b83e0bc/rules/src/main/java/amie/rules/eval/Evaluator.java#L147-L149
and then we end up with a mapping of a mapping.
Note that these issues are arising while I build my own experiments. So there are many amie.rules.eval
classes that I've not used yet (and probably won't use).
PR #9 implements the modifications I've done in order to fix all the issues discussed above.
Third issue
RuleHitsEvaluator
calculates the wrong number of hits for one-var rules.Note how by L91,
t
is either?a, head[1], binding
orbinding, head[1], ?b
.This triple will then be used here:
As we still have a variable in the triple, we will probably match with something in
target
, and count the prediction as a hit.The correct behavior would generate the triple
binding, head[1], head[2]
orhead[0], head[1], binding
.Fourth issue
Predictor
raisesNullPointerException
when we have rules with one-var head. Well, I guess it was never used/required before...Also,
Predictor
has the same problem as inRuleHitsEvaluator
(related toInt2ObjectMap<Int2IntMap>
vsInt2ObjectMap<IntSet>
), and described before (#8 (comment)).Fifth issue
Predictor
andEvaluator
should unmap the triples during output. E.g., at:The problem is, e.g., that
Predictor
generates the input forEvaluator
, which will calland then we end up with a mapping of a mapping.
I agree with these bug fixes. Issues 4 and 5 appeared because AMIE3 has not been evaluated in terms of prediction quality. Issue 3 seems like an old bug that survived because we always evaluated the predictions of rules with two variables in the head. @falcaopetri thank you very much for these bug reports.
private static final String VariableRegex = Pattern.quote(Character.toString(VariableSign)) - + "(_)?([a-z])([0-9])?([0-9])?"; + + "(_)?([a-z])([0-9])?([0-9])?([0-9])?([0-9])?";
For Issue 2 I wonder if it is worth being more general when matching variable names:
private static final String VariableRegex = Pattern.quote(Character.toString(VariableSign))
+ + "(_)?([a-z])([0-9])*";
What do you think @lajus ?
Hi,
It is not so much about AMIE3 being evaluated or not in terms of rule quality, it is more that I did not put the effort on making all the satellite scripts compatible with the new data structures.
The migration to fastutils is not backward compatible and I focused exclusively on making AMIE works with it. As such, all the other scripts should be considered as untested.
As for the Variable Regex, the best solution would be to restrict even more to be sure AMIE performs as expected, e.g
private static final String VariableRegex = Pattern.quote(Character.toString(VariableSign)) + "(_)?([a-z])([0-9])?";
You should understand that from now on everything is internally mapped to integer, every entity, relation or variable. Which also means that we have a limited pool of variable symbols. If I allow more patterns, necessarily two different variable symbols may be mapped to the same integer internally and from there all hell breaks loose.
This is clearly not backward-compatible, but I believe 260 (or 520) variable symbols should be enough to perform most task AMIE deals with.
As such, the usage of Rule.fullyUnboundTriplePattern1();
should be prohibited in the near future.
So except for the variableRegex, I will probably accept the other pull requests soon (I need to quickly check them first).
Cheers,
Jonathan
Hi Jonathan,
Thanks for shedding light on this issue. I see the problem now, we operate in a small space as suggested by the implementation of the method mapVariable in the KB class.
Cheers,
Luis