kermitt2 / entity-fishing

A machine learning tool for fishing entities

Home Page:http://nerd.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support for Swedish language

EmilStenstrom opened this issue · comments

Hi!

In the list of languages I don't see Swedish. It's a small language, but has a very big wikipedia with ~2.5M articles. Can entity-fishing be trained on swedish, or is there some deeper reason that it's not included?

Hi @EmilStenstrom !

Thank you for the request. Swedish should work well indeed given the size of its Wikipedia. I think it's the largest one not support by entity-fishing yet, with Dutch. It will try to include it in the next batch of supported languages.

That sounds awesome! Looking forward to testing it! :)

Nice! Happy to see it disambiguate Swedish. Looking at that specific example, the things it mentions are not entities, but they are “concepts”. Translated: “year”, “consumption”, “health”. Is that intentional?

Yes that's the goal, every Wikidata entities is disambiguated, based on the Wikipedia anchors - Wikidata calls "entities" the concepts and their instances. We can then refine based on the statements P279 and P31 to select what's wanted for a given task/application. Another one more:

Screenshot from 2023-01-21 21-16-59

Awesome! Using wikidata statements to select what you want is super powerful. Eager to try this out when 0.0.6 is released! :)