dig-team / amie

Mavenized AMIE+Typing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problem parsing rules in Freebase

lgalarra opened this issue · comments

Hi,

AMIE cannot parse rules such as:

?a /film/actor/film./film/performance/film /m/0340hj => ?a neg_/award/award_nominee/award_nominations./award/award_nomination/award

I have boiled down the problem to the method rules in KB.java. I am proposing a new regex for parsing. We do not supported typed literals on the other hand. My fix is available in #19 .

Cheers,
Luis

Hi,

PR #19 broke some experiments of mine (nothing too bad), so I started debugging.
@lgalarra, could you confirm that the new regex is able to parse your example:

?a /film/actor/film./film/performance/film /m/0340hj => 
?a neg_/award/award_nominee/award_nominations./award/award_nomination/award

It seems that the numbers are not being parsed correctly (0340hj, in this case). The same thing happened with some of my triples. What I get here is:

?a /film/actor/film./film/performance/film /m/ => 
?a neg_/award/award_nominee/award_nominations./award/award_nomination/award

I've included some tests in PR #22. I hope it helps spotting this things in the future. I'm not so good with regex, so I'd be glad if you checked this issue with numbers.

Also, AMIEParser is recognizing the following "rule" from AMIE's output: Lossless (query refinement => ) heuristics enabled. Should I just remove the "header" from AMIE's output, or is AMIEParser supposed to work with it? Apparently, previous regex didn't capture it as a rule pattern.

Regards,
Antonio.

Hi,

The regexp should be fixed (according to @falcaopetri 's test cases, thanks a lot for those).

Problem was in URI pattern, @lgalarra removed "\w" that was necessary to match numbers (\p{L} only match letters).
We may wanna consider: Use \p{Nd} instead of \w to match numbers (pure unicode numbers) ? Unicode punctuation (’ U+2019) ? Make triplePattern consistent with amieTriplePattern ?

Hi,

I just would like to point out that KB.triples still parses Lossless (query refinement) heuristics enabled as the triples: { Lossless (query refinement , ) heuristics enabled }.

KB.rule returns null though, since the string does not contain :- , or =>.