Problem parsing rules in Freebase
lgalarra opened this issue · comments
Hi,
AMIE cannot parse rules such as:
?a /film/actor/film./film/performance/film /m/0340hj => ?a neg_/award/award_nominee/award_nominations./award/award_nomination/award
I have boiled down the problem to the method rules in KB.java. I am proposing a new regex for parsing. We do not supported typed literals on the other hand. My fix is available in #19 .
Cheers,
Luis
Hi,
PR #19 broke some experiments of mine (nothing too bad), so I started debugging.
@lgalarra, could you confirm that the new regex is able to parse your example:
?a /film/actor/film./film/performance/film /m/0340hj =>
?a neg_/award/award_nominee/award_nominations./award/award_nomination/award
It seems that the numbers are not being parsed correctly (0340hj
, in this case). The same thing happened with some of my triples. What I get here is:
?a /film/actor/film./film/performance/film /m/ =>
?a neg_/award/award_nominee/award_nominations./award/award_nomination/award
I've included some tests in PR #22. I hope it helps spotting this things in the future. I'm not so good with regex, so I'd be glad if you checked this issue with numbers.
Also, AMIEParser
is recognizing the following "rule" from AMIE's output: Lossless (query refinement => ) heuristics enabled
. Should I just remove the "header" from AMIE's output, or is AMIEParser
supposed to work with it? Apparently, previous regex didn't capture it as a rule pattern.
Regards,
Antonio.
Hi,
The regexp should be fixed (according to @falcaopetri 's test cases, thanks a lot for those).
Problem was in URI pattern, @lgalarra removed "\w" that was necessary to match numbers (\p{L} only match letters).
We may wanna consider: Use \p{Nd} instead of \w to match numbers (pure unicode numbers) ? Unicode punctuation (’ U+2019) ? Make triplePattern consistent with amieTriplePattern ?
Hi,
I just would like to point out that KB.triples
still parses Lossless (query refinement) heuristics enabled
as the triples: { Lossless (query refinement
, ) heuristics enabled
}.
KB.rule
returns null
though, since the string does not contain :-
, or =>
.