wetneb / pynif

A small Python library for NLP Interchange Format (NIF) for NER(D) systems

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

nif:OffsetBasedString option is missing

Ga11u opened this issue · comments

commented

On the current version of pynif, any URI for Contexts and Phrases is represented as a nif:OffsetBasedString . It would be great to also cover the nif:ContextHashBasedString .

I think it can be done ensuring backguard compatibility by adding an optional argument like:

  • NIFContext( hash_uri )
  • NIFPhrase( hash_uri )

Then, their respective def triples(self) functions could have something like:

        if self.hash_uri:
            yield (self.uri, RDF.type, NIF.ContextHashBasedString)
        else:
            yield (self.uri, RDF.type, NIF.OffsetBasedString)

And both NIFContext and NIFPhrase def __init__ could be changed with something like this:

 def __init__( ... , hash_uri = None):
      slef.original_uri = uri if not hash_uri else hash_uri

@Ga11u I am not familiar with ContextHashBasedString but what you are proposing sounds very sensible. Would you be open to making a PR for this? It looks like you have thought the code changes through already :)

commented

@wetneb The ContextHashBasedString is discussed in the paper Linked-Data Aware URI Schemes for Referencing Text Fragments (https://link.springer.com/content/pdf/10.1007%2F978-3-642-33876-2_17.pdf) page 4.

The ContextHashBasedString has its particular structure according to NIF doccumentation, which is not yet provided by pynif. Creating ContextHashBasedString URI follwing NIF documentation is not that easy as creating OffsetBassedStrings and it can add more complexity to the pynif library (if we want that feature be automatic). I think the ContextHashBasedString URI string should be provided as input by the user and pynif can give the correct rdf.type annotation (as I suggested on the issue description).

If this is something you would like to have it, yes, I can do the PR.

Yes a PR would be fantastic! :)

commented

@wetneb PR done 😃