Implement a way of pattern/rule tracing

Question

Implement a way of pattern/rule tracing

ftyers opened this issue 7 years ago · comments

It would be useful to be able to trace the output of the program, e.g. to be able to see which patterns are matched. e.g. for each token to know what form/text/lemma/child/agree is set to.

Amir Zeldes · Answer 1 · Sat Nov 18 2017 01:51:08 GMT+0800 (China Standard Time)

If you hover on the mentions in HTML output you'll see a tooltip with most of these (maybe more could be shown, not sure what you mean by child)

Francis Tyers · Answer 2 · Sat Nov 18 2017 01:56:16 GMT+0800 (China Standard Time)

I mean on the command line, sometimes I write a rule and it doesn't work (e.g. nothing appears in the HTML), it would be good to be able to trace why it might not be working. It could work by e.g. printing out the line of the dependency tree in CoNLL and then a list of matched variables, e.g.

2       Пушкин  Пушкин  PROPN   _       Animacy=Anim|Case=Nom|Gender=Masc|Number=Sing   3       nsubj   _       _
form = proper
text = Пушкин
lemma = Пушкин
agree = 3sg,male

etc.

Amir Zeldes · Answer 3 · Sat Nov 18 2017 02:11:52 GMT+0800 (China Standard Time)

What do you mean by not appearing in the HTML? Unless you have singleton detection switched off, all mentions that are not ruled out by a stop list should show up in HTML. If singletons are on and something doesn't show up, it means the system rejected it as a mention very early. Or are you looking to get 'currently attested categories' on all tokens?

Francis Tyers · Answer 4 · Sat Nov 18 2017 02:20:50 GMT+0800 (China Standard Time)

Aha! Ok, that was the problem. I had remove_singletons=True in the config.ini.

But even so:

form="proper";form="proper"&lemma=$1;100;nopropagate

I have this rule in coref_rules.tab, and here is what i'm getting from the HTML:

Amir Zeldes · Answer 5 · Sat Nov 18 2017 03:58:04 GMT+0800 (China Standard Time)

If I had to guess, I'd guess that the agreement information is shooting down the match. Note how one is 'male' and the other is 'Animacy=Anim|....'. As far as xrenner is concerned, the latter is a monolithic value.

There are two main ways of dealing with this - one is to use DepEdit rules to collapse annoying classes, which can be good because you can use syntactic conditions. Another is to fiddle with the 'Agreement Class Detection' section of config.ini, especially morph_rules. Here's an example from my German model, which relies on RFTagger morphological features:

# Edit morphology information - cascade of string replace rules to use on the morph field in conll data if available
morph_rules=.*([12]).*(Sg|Pl).*/\1\2;([12])Sg/\1;^[^0-9].*(Pl).*/\1;^[^0-9].*(Fem|Masc|Neut).*/\1;.*\.\*$/_

This takes tags like this:

PRO.Pers.Subst.3.Nom.Pl.*
N.Reg.Dat.Sg.Fem
And makes them like this:
Pl
Fem

Francis Tyers · Answer 6 · Sat Nov 18 2017 04:08:06 GMT+0800 (China Standard Time)

Aha, ok, I added:

morph_rules=[^|]+|Gender=Masc|[^|]+/male

Now I get:

And the only two rules I have in the coref_rules.tab are:

$ cat models/rus/coref_rules.tab  | grep -v '^#'
form="proper";form="proper"&text=$1;100;nopropagate
form="proper";form="proper"&lemma=$1;100;nopropagate

They both seem to have the same lemma and the agreement features are the same too.

Amir Zeldes · Answer 7 · Sat Nov 18 2017 04:38:52 GMT+0800 (China Standard Time)

OK, that's definitely weird. Did you put proper nouns in lemma_match_pos? Or maybe turned on proper_mod_must_match?

If it's not one of those, could you send me the model and the parse?

Francis Tyers · Answer 8 · Sat Nov 18 2017 04:45:49 GMT+0800 (China Standard Time)

# What POS categories should allow lemma matching of heads for coreference? e.g. /^NNS?$/ to allow singular and plural nouns to match based on lemma
lemma_match_pos=/none/
...
# Do proper noun modifiers have to match exactly across mentions? (NB: this may include proper modifiers such as Mr.!! Often leaving this False is better)
proper_mod_must_match=False
...

I'll send over the zip file with the model and the conllu file :)