dig-team / amie

Mavenized AMIE+Typing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Issues with amie.rules.eval.*

falcaopetri opened this issue · comments

Hi, I'm migrating from AMIE+ to AMIE3, and just found some issues with AMIE3's amie.rules.eval.RuleHitsEvaluator.

First issue

https://github.com/lajus/amie/blob/b0f84cc052089bdcc991881886a0e9c80b83e0bc/rules/src/main/java/amie/rules/eval/RuleHitsEvaluator.java#L94-L97

Although the casting at L94-L95 seems to succeed, I get the following during the .keySet() at L97:

Exception in thread "main" java.lang.ClassCastException: class it.unimi.dsi.fastutil.ints.IntOpenHashSet cannot be cast to class it.unimi.dsi.fastutil.ints.Int2IntMap (it.unimi.dsi.fastutil.ints.IntOpenHashSet and it.unimi.dsi.fastutil.ints.Int2IntMap are in unnamed module of loader 'app')
	at amie.rules.eval.RuleHitsEvaluator.main(RuleHitsEvaluator.java:97)

Debugging

Indeed, the casting seems to be wrong. The 2 vars binding-case is generated by generateBindingsForTwoVariables at
https://github.com/lajus/amie/blob/b0f84cc052089bdcc991881886a0e9c80b83e0bc/rules/src/main/java/amie/rules/eval/Predictor.java#L115-L122

, which ends up at
https://github.com/lajus/amie/blob/b0f84cc052089bdcc991881886a0e9c80b83e0bc/kb/src/main/java/amie/data/KB.java#L2542

, and clearly returns an Int2ObjectMap<IntSet>.

Workaround

-                               Int2ObjectMap<Int2IntMap> twoVarsBindings = 
-                                               (Int2ObjectMap<Int2IntMap>)bindings;
+                               Int2ObjectMap<IntSet> twoVarsBindings =
+                                               (Int2ObjectMap<IntSet>)bindings;
                                for(int value1: twoVarsBindings.keySet()){
-                                       for(int value2: twoVarsBindings.get(value1).keySet()){
+                                       for(int value2: twoVarsBindings.get(value1)){

Second issue

After resolving the first issue, I got:

Exception in thread "main" java.lang.IllegalArgumentException: Variable ?s100 DO NOT MATCH "\?(_?)[a-z][0-9]{1,2,3}"
	at amie.data.KB.parseVariable(KB.java:226)
	at amie.data.KB.map(KB.java:272)
	at amie.rules.Rule.fullyUnboundTriplePattern1(Rule.java:236)
	at amie.rules.eval.Evaluator.evaluate(Evaluator.java:32)
	at amie.rules.eval.RuleHitsEvaluator.main(RuleHitsEvaluator.java:115)

Debugging

The KB.parseVariable tried to match the string ?s100 with:
https://github.com/lajus/amie/blob/b0f84cc052089bdcc991881886a0e9c80b83e0bc/kb/src/main/java/amie/data/KB.java#L77-L80

We got ?s100 because every triple evaluated by Evaluator (i.e., every prediction) increments an static counter (with Rule.fullyUnboundTriplePattern1()).

Previous versions of AMIE did not have these issues, since it just:

https://github.com/lajus/amie/blob/38155a5632b78f061889e9ad6e734a180801d83b/rules/src/main/java/amie/rules/Rule.java#L225-L232

Workaround

         private static final String VariableRegex = Pattern.quote(Character.toString(VariableSign)) 
-                + "(_)?([a-z])([0-9])?([0-9])?";
+                + "(_)?([a-z])([0-9])?([0-9])?([0-9])?([0-9])?";

Solution

I'm not sure, but I guess Evaluator does not require a new fullyUnboundTriplePattern1. Can't we just use ?s and ?o everytime? The following works for me:

-               int[] head = Rule.fullyUnboundTriplePattern1();
+               int[] head = new int[3];
+               head[0] = KB.map("?s");
                head[1] = triple[1];
+               head[2] = KB.map("?o");
+

After fixing these two issues, I got the same results of AMIE+'s amie.rules.eval.RuleHitsEvaluator.

Third issue

RuleHitsEvaluator calculates the wrong number of hits for one-var rules.

Note how by L91, t is either ?a, head[1], binding or binding, head[1], ?b.
https://github.com/lajus/amie/blob/b0f84cc052089bdcc991881886a0e9c80b83e0bc/rules/src/main/java/amie/rules/eval/RuleHitsEvaluator.java#L81-L92

This triple will then be used here:
https://github.com/lajus/amie/blob/b0f84cc052089bdcc991881886a0e9c80b83e0bc/rules/src/main/java/amie/rules/eval/Evaluator.java#L46

As we still have a variable in the triple, we will probably match with something in target, and count the prediction as a hit.

The correct behavior would generate the triple binding, head[1], head[2] or head[0], head[1], binding.

Fourth issue

Predictor raises NullPointerException when we have rules with one-var head. Well, I guess it was never used/required before...
https://github.com/lajus/amie/blob/b0f84cc052089bdcc991881886a0e9c80b83e0bc/rules/src/main/java/amie/rules/eval/Predictor.java#L232-L235

Also, Predictor has the same problem as in RuleHitsEvaluator (related to Int2ObjectMap<Int2IntMap> vs Int2ObjectMap<IntSet>), and described before (#8 (comment)).

Fifth issue

Predictor and Evaluator should unmap the triples during output. E.g., at:

https://github.com/lajus/amie/blob/b0f84cc052089bdcc991881886a0e9c80b83e0bc/rules/src/main/java/amie/rules/eval/Evaluator.java#L154

The problem is, e.g., that Predictor generates the input for Evaluator, which will call
https://github.com/lajus/amie/blob/b0f84cc052089bdcc991881886a0e9c80b83e0bc/rules/src/main/java/amie/rules/eval/Evaluator.java#L147-L149
and then we end up with a mapping of a mapping.

Note that these issues are arising while I build my own experiments. So there are many amie.rules.eval classes that I've not used yet (and probably won't use).
PR #9 implements the modifications I've done in order to fix all the issues discussed above.

Third issue

RuleHitsEvaluator calculates the wrong number of hits for one-var rules.

Note how by L91, t is either ?a, head[1], binding or binding, head[1], ?b.

https://github.com/lajus/amie/blob/b0f84cc052089bdcc991881886a0e9c80b83e0bc/rules/src/main/java/amie/rules/eval/RuleHitsEvaluator.java#L81-L92

This triple will then be used here:

https://github.com/lajus/amie/blob/b0f84cc052089bdcc991881886a0e9c80b83e0bc/rules/src/main/java/amie/rules/eval/Evaluator.java#L46

As we still have a variable in the triple, we will probably match with something in target, and count the prediction as a hit.

The correct behavior would generate the triple binding, head[1], head[2] or head[0], head[1], binding.

Fourth issue

Predictor raises NullPointerException when we have rules with one-var head. Well, I guess it was never used/required before...

https://github.com/lajus/amie/blob/b0f84cc052089bdcc991881886a0e9c80b83e0bc/rules/src/main/java/amie/rules/eval/Predictor.java#L232-L235

Also, Predictor has the same problem as in RuleHitsEvaluator (related to Int2ObjectMap<Int2IntMap> vs Int2ObjectMap<IntSet>), and described before (#8 (comment)).

Fifth issue

Predictor and Evaluator should unmap the triples during output. E.g., at:

https://github.com/lajus/amie/blob/b0f84cc052089bdcc991881886a0e9c80b83e0bc/rules/src/main/java/amie/rules/eval/Evaluator.java#L154

The problem is, e.g., that Predictor generates the input for Evaluator, which will call

https://github.com/lajus/amie/blob/b0f84cc052089bdcc991881886a0e9c80b83e0bc/rules/src/main/java/amie/rules/eval/Evaluator.java#L147-L149

and then we end up with a mapping of a mapping.

I agree with these bug fixes. Issues 4 and 5 appeared because AMIE3 has not been evaluated in terms of prediction quality. Issue 3 seems like an old bug that survived because we always evaluated the predictions of rules with two variables in the head. @falcaopetri thank you very much for these bug reports.

private static final String VariableRegex = Pattern.quote(Character.toString(VariableSign)) 
-                + "(_)?([a-z])([0-9])?([0-9])?";
+                + "(_)?([a-z])([0-9])?([0-9])?([0-9])?([0-9])?";

For Issue 2 I wonder if it is worth being more general when matching variable names:

private static final String VariableRegex = Pattern.quote(Character.toString(VariableSign)) 
+                + "(_)?([a-z])([0-9])*";

What do you think @lajus ?

Hi,

It is not so much about AMIE3 being evaluated or not in terms of rule quality, it is more that I did not put the effort on making all the satellite scripts compatible with the new data structures.

The migration to fastutils is not backward compatible and I focused exclusively on making AMIE works with it. As such, all the other scripts should be considered as untested.

As for the Variable Regex, the best solution would be to restrict even more to be sure AMIE performs as expected, e.g
private static final String VariableRegex = Pattern.quote(Character.toString(VariableSign)) + "(_)?([a-z])([0-9])?";

You should understand that from now on everything is internally mapped to integer, every entity, relation or variable. Which also means that we have a limited pool of variable symbols. If I allow more patterns, necessarily two different variable symbols may be mapped to the same integer internally and from there all hell breaks loose.

This is clearly not backward-compatible, but I believe 260 (or 520) variable symbols should be enough to perform most task AMIE deals with.

As such, the usage of Rule.fullyUnboundTriplePattern1(); should be prohibited in the near future.

So except for the variableRegex, I will probably accept the other pull requests soon (I need to quickly check them first).

Cheers,
Jonathan

Hi Jonathan,
Thanks for shedding light on this issue. I see the problem now, we operate in a small space as suggested by the implementation of the method mapVariable in the KB class.

Cheers,
Luis