dpalmasan / TRUNAJOD2.0

An easy-to-use library to extract indices from texts.

Home Page:https://trunajod20.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Migrate models built in Spacy to use Stanford models

dpalmasan opened this issue · comments

With the new release of stanza:

https://stanfordnlp.github.io/stanza/

Maybe it is a good opportunity to improve accuracy. The issue is about investigating if this could improve our accuracy and cost estimates of migration.

Thank you for open-sourcing this repo! It's helping a lot with my research.

Regrading Stanza migration, unless you have a tight deadline, I could help. However, I doubt the accuracy would improve by much. SpaCy had a major improvement quite recently https://spacy.io/usage/v3. But, of course, Stanza would look much better for research papers.

Hello Bruce! Sure, I don't have a tight deadline, so your contribution is more than welcome! There are some differences in stanza pre-trained models compared to spacy ones, so maybe I am not sure if completely migrating it, but having the alternative of using stanza models instead of spacy might improve performance in some cases!

Oh, so do you mean adding an option to use Stanza? Hmm, I'm familiar with both Stanza and spaCy, but the biggest trouble for me would be dealing with Spanish texts. I only know Spanish at a very introductive level.

Anyways, I looked through Entity Grid and TTR features, which both seem to require minimal Spanish skills. I'll first create a pull request (in a few days) for these files. I'll try to add options to use Stanza rather than fully migrate to Stanza. One could then choose which to use.

I mean, initially I wanted to completely replace spacy, but as you mentioned, spacy improved over time, so maybe removing all the spacy references will not be as good as having options for both stanza and spacy. No worries regarding Spanish related features. I can update them. BTW thanks for your desire to contribute!

No worries. I'm also working on a similar project so it'll help me too anyways :)