cltl / ba-text-mining

Hands-on material for the course text-mining BA, taught at VU Amsterdam

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

lab4: Spotlight API breaking changes

samuelstroschein opened this issue · comments

Hi, 

the spotlight API endpoint "http://model.dbpedia-spotlight.org/en/disambiguate" used in lab 4b.2 does not exist any longer. The new API documentation does not contain a disambiguate endpoint. I tried the other endpoints but the spotlight_disambiguate function does not work with either endpoint: All entities spotlight_link value is "NIL", or when running the original notebook None. The problem lays in the following if statement and underlying call to functions:

for entity in article.entity_mentions:
    start = entity.begin_index
    # start is never in dis_entities
    if str(start) in dis_entities:
        dis_url = dis_entities[str(start)]
    else:
        dis_url = 'NIL'
    entity.spotlight_link = dis_url

Using the "annotate" endpoint the debugger revealed the following state:

I gave up after debugging the code for an hour. Will the lab notebook be adjusted to the new API?

commented

I found an alternative URL and fixed the code.

https://www.dbpedia-spotlight.org/api

spotlight_disambiguation_url="https://demo.dbpedia-spotlight.org/en/annotate"

@piekvossen are you sure that the problem is fixed? Your given endpoint still returns all entity_mentions.spotlight_links to be NIL in my case. That is problematic because the evaluate_entity_linking function throws a division by zero error if a list of NIL/NONE/Null values are passed to calculate precision, recall and f1. 

I reran the updated lab4b.2. Also there, the spotlight_links are NIL with the deviation that some values are not null but only a minority. I used the following code to inspect the entity values:

for article in processed_both:
    for entity in article.entity_mentions:
        print(entity.spotlight_link)

Can you confirm if my approach is correct/or that the API problem is still not fixed?