spencermountain / compromise

modest natural-language processing

Home Page:http://compromise.cool

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Issue]: Various common nouns tagged as proper noun.

MarketingPip opened this issue · comments

It appears actor tags are causing tags to be tagged as NNP (when using - Penn Tag).

ie: "NNP" Proper Noun, Singular

function CompromiseTagger(word) {

    const doc = nlp(word);
    doc.compute('penn');
    const terms = doc.out('json')[0].terms[0];
    return terms.penn
}
console.log(CompromiseTagger("author"));

As far as I know these should be tagged as "Noun" / "NN"...

I would assume the list needs cleaned up and a rule set needs implemented for somethings. Example "bishop" is in the list.

"Bishop" could be tagged as a proper noun if referring to a specific person's title, such as "Bishop John." or [#Actor] (#FirstName|#Person+) etc...

hey Jared, this works for me -

nlp('author').debug() //Noun, Actor, Singular

cheers

@spencermountain - assuming you're using latest version of build...? And have you applied penn tags...? (I noticed some things were wrong - I'd have to reference some PDF's) but I think ordinal numbers are to be tagged as JJ. (I'll have to reference and confirm).

ps; I pulled build from esm but I'll update you shortly to see if I didn't pull latest version or something? 🤷‍♂️

@spencermountain - update. So Compromise (not using Penn tags), tags / chunk's it as a Noun. But when penn compute tags are applied turns into a NNP tag.

But again this code / example should return for your NNP.

import nlp from "https://esm.sh/compromise"


function CompromiseTagger(word) {

    const doc = nlp(word);
    doc.compute('penn');
    const terms = doc.out('json')[0].terms[0];
  return terms.penn
}
console.log(CompromiseTagger("bishop"));
console.log(CompromiseTagger("doctor")); 

agh, my apologies Jared, you're right.
found the errant NNP tag in the mapping. Thank you for your help.
will release a fix for this, this week.
thanks

No worries! I thought I was going crazy (trying to set up some demos of Compromise tagging some things in the HMM model I was showing you) until I started doing some digging hahahah!

That said - hoping you'll be pumped up with HMM model (and maybe consider taking Compromise) that approach etc.. with some rules. Seeing some weirdly crazy good accuracy on tags (without rules) and with some basic rules + help of Compromise (and my half ass brain lol) - even crazier results. Tho < the rule set applied from Compromise. (after predicting tags) - should blow things out of the water.

ps; Ordinal numbers are to be tagged as adjectives JJ - see here for more guidelines.

fixed in 14.12.0 - thank you!