spencermountain / compromise

modest natural-language processing

Home Page:http://compromise.cool

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature Request: Proper Tagging of Names with Possessive Apostrophes

MarketingPip opened this issue · comments

Description:
I would like to propose / request a new feature / rule for Compromise.js that handles names with possessive apostrophes (e.g., "Steve's") by tagging the name (of person / proper noun's - tagged as person) before the apostrophe. Currently, Compromise.js does not provide a straightforward way to handle such cases. (That I do know of)

Why is this feature valuable?
Many texts and documents contain names in possessive form, and it's essential to extract and process these names correctly. By implementing this feature, Compromise.js can possibly improve the tagging process & extract more names properly.

And as far as I know we as human's add plural to pronouns only when finishing the name. Correct me if I am wrong.

Example:

Consider the following text:

"George Lucas's Lucasfilm"

should become:

George Lucas

Proposed Implementation:

  • Update .people() to remove from names or split text / rule at instance of "'s". (Second solution would be ideal - rule set based)
  • Identify possessive forms (e.g., "'s") of people tagged.
  • Keep original text before split ie first possessive found of person (e.g "Bill's" in stored value (JSON)
  • Split each possessive form (of person tag) to isolate the name.
  • Provide a way for users to access the extracted names without the pluralization removed (ie: in doc.people().json()).

Hey Jared, believe that is already

.possessives().strip() - "Spencer's" -> "Spencer"

Cheers

Hey Jared, believe that is already

.possessives().strip() - "Spencer's" -> "Spencer"

Cheers

I think I might have phrased / wrote this issue wrongly.

When calling .people() on "George Lucas's Lucasfilm" it will return

 ["George Lucas's Lucasfilm"] 

when expected output should be

 ["George Lucas"] or  ["George Lucas's"]

Which there should be a rule for ALL people, so when people are tagged and tags look like this

George Lucas's film club
                      ^ stop tagger here
Spencers's awesome library
               ^ stop tagger here

As again - we as humans (as far as I know) do not ever pluralize / have possessive at the end of our names (indicating a stop / split for tagger). Which again should help tagging process so the next words ahead such as "Lucasfilm" can be properly tagged.

Hoping that made more sense & hoping you're having an awesome weekend. :cheers:

ps; look AT that compression issue you closed on me - still think I am onto something!

@spencermountain was trying to think of any names that this rule might not work with - unless Elon must spawns more kids I think it should would work. 😂

ps: I want to add their names / weird names like those to people lexicon. But I got some cool stuff coming up for Compromise.js - plus a way better way for you to get data to populate other versions. Nouns, verbs etc - in the same format you like them / need them. 👌

ah, yeah of course.
yeah - George Lucas's Lucasfilm is definetly mis-tagged, and a .splitAfter a posessive would work great.
Oh man, I didn't know the .people() match logic was this bad. You're welcome to improve it.

I actually have been working on the same thing - i've added a couple hundred missing names on the dev branch, from a wikipedia analysis, which I think is similar to what you're doing.

I hope to get dev stable for a release this week. There are a few tests failing. If you make a pr before then, please do it off the dev branch. cheers

@spencermountain - will do! And that's exactly what I was looking for but not sure how to properly write that in tag rules. If you wanna drop a example for reference - feel free.

And jeez this looked like my half baked idea on determining names from locations.

Food for thought to - save yourself some time by using Wikidata via query! And at last resort - start making data pulls from Wikipedia (I was foolishly doing this before).

Tho I will make a gist with a preview on some basic tool I made for Compromise and see where you wanna shove it. I am hoping to package it as a separate library under MP & you can use it via import & build etc. As I was thinking it will be good for the library instead of a plugin - but that choice will be yours!

It ironically has to do with names too ie: Diminutives, check's if a human name is known as something else such as Steve / Steven. So I got you covered with lots of first names etc. Plus I got a huge DB of Hispanic names etc of inmates that I was planning on making a PR for to add to this and other versions of compromise. Just I have been waiting to slowly make PR's to not piss you off. lol

So look for that notification soon enough & drop me some CONTACT info soon enough lol 👍

ps; apologizes - didn't mean to close on comment lol

this should be fixed in 14.10.1
cheers