jacoscaz / quadstore

A LevelDB-backed graph database for JS runtimes (Node.js, Deno, browsers, ...) supporting SPARQL queries and the RDF/JS interface.

Home Page:https://github.com/jacoscaz/quadstore

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

store.get() and store.sparql() return different values for similar blank nodes

ehupin opened this issue · comments

Hi,

I am quite new to the RDF world and I am currently testing node-quadstore as a solution to persist triples.
During my tests I found what seems to be a weird behavior, as the returned values for blank nodes differ when they are fetch using store.get() vs store.sparql().

Here is a script to reproduce this:

import {Quadstore} from "quadstore";
import {newEngine} from "quadstore-comunica";
const leveldown = require('leveldown')
const rdf = require('rdf-ext')

async function test() {
    const db = leveldown('test')
    const store = new Quadstore({
        backend: db,
        comunica: newEngine(),
        dataFactory: rdf
    });

    const intermediaryNode = rdf.blankNode()
    const quads = [
        rdf.quad(
            rdf.namedNode('http://example.org/from'),
            rdf.namedNode('http://example.org/link'),
            intermediaryNode
        ),
        rdf.quad(
            intermediaryNode,
            rdf.namedNode('http://example.org/link'),
            rdf.namedNode('http://example.org/to'),
        )
    ]

    await store.open()
    await store.multiPut(quads)

    const getResult = await store.get({})
    console.log(getResult.items)
    // [
    //     QuadExt {
    //         subject: BlankNodeExt { value: 'b1' },
    //         predicate: NamedNodeExt { value: 'http://example.org/link' },
    //         object: NamedNodeExt { value: 'http://example.org/to' },
    //         graph: DefaultGraphExt { value: '' }
    //     },
    //     QuadExt {
    //         subject: NamedNodeExt { value: 'http://example.org/from' },
    //         predicate: NamedNodeExt { value: 'http://example.org/link' },
    //         object: BlankNodeExt { value: 'b1' },
    //         graph: DefaultGraphExt { value: '' }
    //     }
    // ]


    const sparqlResult = await store.sparql(`SELECT * WHERE { ?s ?p ?o}`);
    console.log(sparqlResult.items)
    // [
    //     {
    //         '?s': BlankNode { termType: 'BlankNode', value: 'b11' },
    //         '?p': NamedNodeExt { value: 'http://example.org/link' },
    //         '?o': NamedNodeExt { value: 'http://example.org/to' }
    //     },
    //     {
    //         '?s': NamedNodeExt { value: 'http://example.org/from' },
    //         '?p': NamedNodeExt { value: 'http://example.org/link' },
    //         '?o': BlankNode { termType: 'BlankNode', value: 'b12' }
    //     }
    // ]


    await store.close()
}

test()

What bugs me here is that the blank node I created is returned as a single one (b1) when I use store.get(), but it is returned as two different ones (b11 and b12) when I use store.sparql().

Am I missing something about how this work, and should I change the way I create/store/fetch my data to prevent such a behavior?

Here are the versions I use:

    "leveldown": "^5.6.0",
    "quadstore": "^8.0.0",
    "quadstore-comunica": "^0.3.1",
    "rdf-ext": "^1.3.1"

Hello @ehupin! Thank you for that code snippet, I can confirm I am able to reproduce this. It's weird, the correct behavior is the one you're getting from store.get() and that is also the behavior I would have expected from store.sparql().

As a temporary workaround you can skolemise blank nodes into named nodes, which I would recommend anyway as blank nodes can be rather confusing.

@rubensworks is this expected behavior in Comunica? Quadstore returns blank nodes with the same labels they had when inserted, which leads me to think that something in Comunica's handling of blank nodes might be causing this.

The client shouldn't expect stable blank nodes though. That's what URIs are for :)

Just for clarify and set expectations: although blank node labels are not guaranteed to be stable, quads and the relationships between quads are. The issue, here, is not that the labels change between store.get() and store.sparql() but, rather, that the way they change when using the latter is breaking the relationship between those quads.

Thanks all for your answers!
First it gaves me a better understanding the issue but moreover it helps me to better grasp how to use and what are the limitations of blank nodes. Skolemization is an interesting subject that I will definitely explore!

is this expected behavior in Comunica?

Yep, that's expected and intentional behaviour. We have to do this to ensure non-clashing bnodes when querying over multiple sources.

It's quite normal for RDF tools to modify bnode labels like this, as you can indeed never attach meaning to them when using them across different documents/contexts.

Scratch my latest reply. I didn't read the issue well enough.

Links between blank nodes are only defined within the context of a single document or query execution. However, no meaning should be attached to their concrete labels, as these can change at any time.

In that sense, the output of store.get is correct, but store.sparql is wrong. (Either the bnode in the first triple should have label b12, or the second triple should have b11)
Something probably is going wrong at the connection point between Quadstore and Comunica.

I should be able to look into this within a week from today. Apologies for the latency, I'm having a couple of very intense weeks.

@rubensworks I've managed to reproduce the problem with N3.Store.

I've modified the script provided by @ehupin to be able to easily switch between quadstore and N3.Store. It still uses a bunch of utils and types from quadstore as I haven't had the time to make it fully agnostic but the main parts are now implementation-independent:

  • I'm using RDF/JS interfaces to import quads (.import()) and read quads (.match());
  • I'm using the Comunica engine instance directly, passing the instantiated store to it.

Is there something in how I am packaging Comunica that might trigger this?

import {Quadstore} from "./lib/quadstore";
import {newEngine} from "quadstore-comunica";
import {BindingArrayResult, QuadArrayResult} from './lib/types';
import leveldown from 'leveldown';
import {DataFactory} from 'rdf-data-factory';
import { Store as N3Store } from 'n3';
// const rdf = require('rdf-ext')
import { ArrayIterator, wrap } from 'asynciterator';
import { streamToArray } from './lib/utils';
import {Algebra, translate} from 'sparqlalgebrajs';

async function test() {

  const rdf = new DataFactory();
  const engine = newEngine();

  const store = new N3Store();

  // const db = leveldown('test');
  // const store = new Quadstore({
  //   backend: db,
  //   comunica: newEngine(),
  //   dataFactory: rdf
  // });
  // await store.open();

  const intermediaryNode = rdf.blankNode();
  const quads = [
    rdf.quad(
      rdf.namedNode('http://example.org/from'),
      rdf.namedNode('http://example.org/link'),
      intermediaryNode
    ),
    rdf.quad(
      intermediaryNode,
      rdf.namedNode('http://example.org/link'),
      rdf.namedNode('http://example.org/to'),
    )
  ]

  await new Promise((resolve, reject) => {
    store.import(new ArrayIterator(quads))
      .on('end', resolve)
      .on('err', reject)
    ;
  });

  // @ts-ignore
  const storeQuads: Quad[] = await streamToArray(store.match());
  console.log(storeQuads);

  const sparqlQuery = 'SELECT * WHERE { ?s ?p ?o}';
  const sparqlOperation = translate(sparqlQuery, { quads: true, dataFactory: rdf });
  const sparqlResult = await engine.query(sparqlOperation, { source: store });
  // @ts-ignore
  const sparqlBindings = (await sparqlResult.bindings()).map(b => b.toObject());
  console.log(sparqlBindings);

}

test().catch((err) => {
  console.error(err);
  process.exit(1);
});

Hmm, that's not good.
Not sure what could cause this.
My first guess would be somewhere here https://github.com/comunica/comunica/blob/master/packages/actor-query-operation-quadpattern/lib/ActorQueryOperationQuadpattern.ts

Now that I think of it, this sounds similar to comunica/comunica#773, which I initially thought to be a parsing issue, but may very well have the same cause as here.

@rubensworks I'll see whether I can replicate this in a new test within Comunica's test suite and open an issue over there if so.

Opened issue upstream: comunica/comunica#795

For posterity: this has required some work in both Comunica and sparqlee, the latter being Comunica’s SPARQL expression evaluator. I think we’re relatively close to fixing this and the fix will surely be included in the next version of quadstore. Relevant issues and PRs:

Released in quadstore@9.0.0!