sbmlteam / libCombine

a C++ library for working with the COMBINE Archive format

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LibCombine doesn't seem to be able to "see" files with a `*.rdf` extension.

CiaranWelsh opened this issue · comments

I am trying to use libcombine as the interface between libsemsim and omex archives. However, at present it doesn't look like libCombine can "see" files with a *.rdf extension. For instance, when looking through the the files inside Smith2004.omex, which has the following tree:

(base) ciaran@DESKTOP-K0APGUV:/mnt/d/libsemsim/tests/Smith_2004$ tree
.
├── manifest.xml
└── model
    ├── smith_2004.cellml
    └── smith_2004.rdf

with the following code:

#include "combine/combinearchive.h"

int main(){
    std::string filename = "/mnt/d/libsemsim/tests/Smith_2004.omex";

    CombineArchive combineArchive;
    combineArchive.initializeFromArchive(filename);

    int num_entries = combineArchive.getNumEntries();

    std::cout << "number of entries: " << num_entries << std::endl;
    auto locs = combineArchive.getAllLocations();
    for (auto &it : locs) {
        std::cout << it << std::endl;
    }
    return 0;
}

The output I get is:

number of entries: 2
./manifest.xml
./model/smith_2004.cellml

Is there something I'm doing wrong or is this a problem with libCombine?

Hello Cieran,

this is working as intended, though perhaps unexpected. Here is what happens, when parsing the manifest, the library noted that:

location="./model/smith_2004.rdf" format="http://identifiers.org/combine.specifications/omex-metadata"

is one of the prescribed omex-metadata objects, as outlined in the spec, so instead of listing it as a file, it does store the information in a metadata object.

Would you like this behavior to change?

thanks
Frank

See also the example file, of how i thought that interactions with the standardized metadata element would take place:

https://github.com/sbmlteam/libCombine/blob/master/examples/python/printExample.py

Hi Frank,

Ah okay, I understand. I won't ask for any changes just yet as I'm not familiar enough with libCombine to be able to make informed requests. I suspect I'll be able to work with what's already there - for the moment my goal is to use libcombine to check for the presence of an RDF file. If its there I'll read the content into a triple store and if its not I'll check the model(s) for annotations. If they exist in the model, I'll pull them out and create the RDF file.

So, I've been able to retrieve the model level annotations using:

    std::string filename = "/mnt/d/libsemsim/tests/Smith_2004.omex";

    CombineArchive combineArchive;
    combineArchive.initializeFromArchive(filename);

    int num_entries = combineArchive.getNumEntries();

    std::cout << "number of entries: " << num_entries << std::endl;
    auto locs = combineArchive.getAllLocations();
    for (auto &it : locs) {
        std::cout << it << std::endl;
    }

    OmexDescription description = combineArchive.getMetadataForLocation(locs[1]);
    std::cout << description.toXML() << std::endl;

which outputs:

<?xml version='1.0' encoding='UTF-8'?>
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' xmlns:dcterms='http://purl.org/dc/terms/' xmlns:vCard='http://www.w3.org/2006/vcard/ns#'>
  <rdf:Description rdf:about=''>
    <dcterms:description></dcterms:description>
    <dcterms:modified rdf:parseType='Resource'>
      <dcterms:W3CDTF>2020-04-28T09:19:47Z</dcterms:W3CDTF>
    </dcterms:modified>
    <dcterms:created rdf:parseType='Resource'>
      <dcterms:W3CDTF>2000-01-01T00:00:00Z</dcterms:W3CDTF>
    </dcterms:created>
  </rdf:Description>
</rdf:RDF>

(which actually is just looks like the default information), but how do I access the semantic annotations? For instance, the first one in the rdf file is:

  <rdf:Description rdf:about="./smith_2004.cellml#source_15">
    <semsim:hasMultiplier>1.0</semsim:hasMultiplier>
    <semsim:hasPhysicalEntityReference rdf:resource="./smith_2004.cellml#entity_19"/>
  </rdf:Description>

Thanks for the quick response,

Ciaran

Hello Ciaran,

as mentioned before, the only files for which this will happen is for the files with format

'http://identifiers.org/combine.specifications/omex-metadata'

which is not the semsim format, but the one that was defined before. Doh ... i just see that someone modified the information on that page. It used to describe this information (why did they have to use the same format identifier for a different format?):

http://co.mbine.org/specifications/omex.version-1.pdf

i'll make a new release of the library, dropping support for the metadata reading. But that won't happen until next week.

In the meantime, you can either give it a different format specifier, or call extractEntryToString with the path that you already know.

Hi Frank,

I understand, but the rdf file in question ./model/smith_2004.rdf which has the format attribute set to http://identifiers.org/combine.specifications/omex-metadata, contains all of the semantic annotations. From what I can see the OmexDescription object returned from getMetadataForLocation("./model/smith_2004.cellml") drops all of this information (if it was meant to contain this information). Moreover, the string returned from extractEntryToString doesn't contain any annotations either, but this is probably because the new Omex metadata spec prescribes all annotations be stored in the separate rdf file, not in the model document itself.

Anyhow, I think that giving rdf files a different format specifier could work for now. I think this means I'll have to open the *.omex files with zlib, change any files with the *.rdf extension in the manifest to a different format (perhaps http://identifiers.org/combine.specifications/semsim-metadata) close it and then open again with libcombine. It's not ideal but as a temporary measure it'll work... and in the meantime I'll read the original omex spec!

Thanks for the advice, as ever you've been very helpful.

Ciaran

I will create a new version of the library in a couple of days.

@CiaranWelsh, like i said .. all the metadata classes in the library, will not work for the semsim annotation format. It only works for the one in the combine archive specification.

However, I've verified, that you can just get your rdf information from the library, by calling

archive.extractEntryToString("./model/smith_2004.rdf")

see: 92f2861

Excellant, Thank you!

hello @CiaranWelsh , if you use the new version (0.2.6) available from pypi you have an optional argument on initializeFromArchive that will skip the metadata processing step. So i'll close this issue for now. Feel free to repoen.

Thanks Frank - I'll get on this in a couple of days. Will let you know if I get stuck, Ciaran