OntoGene / OGER

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Analyse a tsv where column A is 'id' and column B is 'text'

hrshdhgd opened this issue · comments

Hello,
I have been using OGER for a while now and for my past used case, passing a 'collection' of documents within a folder with the 'id' as filename and the text as a txt file worked perfectly! I was wondering if there was a feature where in I could pass a tsv where column 'A' is an 'id' and column 'B' is the text to be analyzed. I went through the documentation and couldn't find anything similar. Please let me know

Hi @hrshdhgd
No, there's no such input format in OGER.
I see two work-arounds:

  • Convert the data (eg. with a small Python script) to some kind of PubTator format by replacing the tab separator with "|a|" and inserting blank lines between each document. It's a bit of a hack though.
  • Fork OGER and add another Loader, maybe best starting from the PubTatorLoader or TXTLoader, which are probably closest to what you need.

If the latter works, I'm happy to accept a pull request.

Thanks @lfurrer ! I will let you know when I come up with a solution.