spencermountain / compromise

modest natural-language processing

Home Page:http://compromise.cool

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Improvement]: Corporation Rule

MarketingPip opened this issue · comments

Another rule set - (needs major improvement & some tests but - sure we can figure something out).

doc.match('the? (#Noun #Noun|#Noun|#Verb #Noun|#Organization|Acronym|#Adjective #Noun+|#Adjective #Noun+|#Adjective #Noun|#ProperNoun+|#ProperNoun) (and #Noun)? (corporation|corp|incorporated|inc|limited|ltd)')

Please do not create so many issues Jared. This is not an issue with compromise, or something that requires my time.

@spencermountain - my apologizes Spencer, I just want to run anything but that I could see as being beneficial not just for my project but to the Compromise project. But again - just some second eyes are more than useful & prefer you hating me for blowing your notifications instead of creating a dumpster fire by submitting a PR that will make ground breaking changes for the project.

Again - don't mean to be majorly annoying by blowing your notifications up! lol

Yeah, this is becoming a problem. I'm glad you're excited about the project, and you're encouraged to work on these things in your own projects. Creating 100 issues is not productive or helpful. Maintaining a open source project is hard, and I don't have a lot of time in my day.

I guess a better question - way to solve this. How can I propose solutions / changes / updates to lexicon without wasting your time or opening issues and helping maintain / the community as a whole benefit from my changes...?

commented

@MarketingPip Sorry to intrude on your conversation, but I feel a bit compelled to jump in.

I have been using Compromise.js for over 3 years, am a huge fan and I have previously had the pleasure of working with @spencermountain. I'm subscribed to this feed and I make sure to read every issue raised, every release note etc as Compromise is a critical component in our projects.

So, I have read all of the (many) tickets you are raising. It's becoming a frustration for me as it's getting really spammy.

I have to agree with Spencer; A lot of your suggestions probably belong in your own project, not in Compromise itself. The whole point of Compromise is to be (very) lightweight, run (very) fast and facilitate developers to solve NLP problems by providing a generic toolkit that can be used as a foundation for many many problem domains.

Our own company has a large implementation with Compromise at its core and we would not dream of polluting the library with our domain-specific patterns. When we find a bug in Compromise that is blocking us from proceeding, we raise an issue. Or when we have developed a Compromise plugin, verified it works through tests and UAT... then sometimes we consider integrating it with the core library and raising a PR for the benefit of others.

As a matter of GitHub etiquette, if you have something to contribute, you should read the documentation in-depth, investigate the source code in-depth, learn how the unit testing library works... with all this in mind you could simply add features to the library (complete with unit test updates, running all existing unit tests, documentation updates, etc). If the features are rejected, no worries, you're free to use them in your own fork and/or projects.

I think you should build on top of it. That's what compromise was made for.
Sounds like you've got a lot of ambitious ideas for Named-Entity disambiguation that exceed the scope of this project. I can add fixes for some of the holes you've found in the #organization tag, like schools and banks, in an upcoming release.

I say, put a neural net on top, do really aggressive classification of topics, and open-source it. That sounds like a good time. I think you may get frustrated by the slow and increasingly-tedious parts of maintaining a generic ibrary for everybody.
cheers

@MarketingPip Sorry to intrude on your conversation, but I feel a bit compelled to jump in.

I have been using Compromise.js for over 3 years, am a huge fan and I have previously had the pleasure of working with @spencermountain. I'm subscribed to this feed and I make sure to read every issue raised, every release note etc as Compromise is a critical component in our projects.

So, I have read all of the (many) tickets you are raising. It's becoming a frustration for me as it's getting really spammy.

I have to agree with Spencer; A lot of your suggestions probably belong in your own project, not in Compromise itself. The whole point of Compromise is to be (very) lightweight, run (very) fast and facilitate developers to solve NLP problems by providing a generic toolkit that can be used as a foundation for many many problem domains.

Our own company has a large implementation with Compromise at its core and we would not dream of polluting the library with our domain-specific patterns. When we find a bug in Compromise that is blocking us from proceeding, we raise an issue. Or when we have developed a Compromise plugin, verified it works through tests and UAT... then sometimes we consider integrating it with the core library and raising a PR for the benefit of others.

As a matter of GitHub etiquette, if you have something to contribute, you should read the documentation in-depth, investigate the source code in-depth, learn how the unit testing library works... with all this in mind you could simply add features to the library (complete with unit test updates, running all existing unit tests, documentation updates, etc). If the features are rejected, no worries, you're free to use them in your own fork and/or projects.

Thank you for comments. That said - I guess I should be opening a discussion etc for potential rules etc.

Most of the rules I suggested will help keep compromise - lightweight & help with POS tagging. So I don't understand how that is not beneficial...?

You know that things like " bank of #Country" is going to be an organization or "the #word corporation". So I don't understand how these rules aren't critical to keeping things lightweight- nor why this should just be for "my project" - as this will help benefit your company's POS tagger as well. Rather then populate the lexicon full of data - making it non lightweight.

Tho some obviously syntax matches need improved obviously.

ps; if you're company has patterns that can help with POS / identifying places etc. I don't know why you're not contributing them thinking it is not helping the Compromise project. 🤷‍♂️

As well not trying to sh*t on your party - but your PR's here - literally is putting false positives in the lexicon here & the function you added didn't work properly & needed fixed and improved here - these things DO/CAN happen.... Maybe you made a older commit tho with a feature that you added & verified it works through tests that didn't need fixed....?