Create TypePatternTagger to ease tagging types
schmmd opened this issue · comments
Hi John, how about we do move your contribution into Taggers. Often I just need to think of a good way it fits in--any help is appreciated ;-)
Maybe we can create a new tagger called TypePatternTagger
. Maybe you can think of a better name. This tagger would perform a substitution for the type matching syntax. Do you have any suggestions? I thought of <<TypeName>>
but I only somewhat like it. I think it would need to create the sequence <typeStart='TypeName'> <typeCont='TypeName'>*
.
With this new tagger, we could have patterns such as:
<<VerbPhrase>> <<NounPhrase>> <pos='JJ'>
What do you think? Any chance you could look at this on Monday? I think it would be pretty straightforward and it would get you used to my changes.
Nope, didn't get an e-mail when it was opened.
It seems to me that the replacement would need to be
( < typeStart='x' & typeEnd='x'> | ( <typeStart='x'> <typeCountinue='x'>* <typeEnd='x'>) )
I'll look at this in the afternoon, I'm trying to get Dan some Entity Linking results on different data.
John
I think they are the same because it's greedy. Note typeCont means
not typeStart (but it could be typeEnd too).
lazy val typesContinuingAtToken = types -- typesBeginningAtToken -- typesEndingAtToken
but I guess we could change that.
OH, woops, I must have an older version.
I agree the replacement pattern you suggested should work.
Yeah, I changed it and it's confusing. Do you think the current definition
is OK? It seems better than the old one to me (typeCont just means that
were on a token where the type is continuing).
On Mon, Sep 30, 2013 at 9:09 AM, John Gilmer notifications@github.comwrote:
OH, woops, I must have an older version.
—
Reply to this email directly or view it on GitHubhttps://github.com//issues/12#issuecomment-25377020
.
The definition seems fine.
I think <> is ok, but I've come to the think of "<>" as meaning token, what other characters are at our disposal?
{VerbPhrase}
'VerbPhrase'
^VerbPhrase^
Let's do {VerbPhrase}
but you will want to be careful because it's also a
regular expression syntax. I think you will need to:
- Split by whitespace.
- See if a token matches '{.*}'.r and perform the substitution if there
is a match. - Join back together on space.
Example pattern (to make sure we still like it!):
{VerbPhrase} {NounPhrase} <postag='JJ'>
Fyi backticks put your text in code mode. Argh, but they don't work when sent as an email!
Argh... this was a horrible suggestion. You can't split by whitespace!