jacoscaz / quadstore

A LevelDB-backed graph database for JS runtimes (Node.js, Deno, browsers, ...) supporting SPARQL queries and the RDF/JS interface.

Home Page:https://github.com/jacoscaz/quadstore

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

base64Binary literals may not deserialise correctly with classic-level

gsvarovsky opened this issue · comments

e.g.:

dataFactory.literal('MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDFi56F11nqql9rT2OBNrYaEfBXEALnXVPjxjcwQYQjixt4Cakq37+vEzBdL8s3nMEFsPu6L59cqq6Obt3Oh1qnVZL+hY/mC9lZ33rXQsfCd7eQYm+aHKj/A8wqLK5BTtX6FRFDXaH+YdEwI8KS7eOFuwZjQ+LPphK8e4ZpfmVsZQIDAQAB', dataFactory.namedNode('http://www.w3.org/2001/XMLSchema#base64Binary'))

does not deserialise correctly when retrieved in classic-level.

Test case (patch to existing quadstore.prototype.get.js test):

Index: test/quadstore.prototype.get.js
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/test/quadstore.prototype.get.js b/test/quadstore.prototype.get.js
--- a/test/quadstore.prototype.get.js	(revision 78fd616f7b55d70e8c89733fd7e6c223d5b1cc66)
+++ b/test/quadstore.prototype.get.js	(date 1663081452226)
@@ -54,6 +54,12 @@
           dataFactory.literal('Hello, World', 'en-us'),
           dataFactory.namedNode('ex://c4'),
         ),
+        dataFactory.quad(
+          dataFactory.namedNode('ex://s5'),
+          dataFactory.namedNode('ex://p5'),
+          dataFactory.literal('MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDFi56F11nqql9rT2OBNrYaEfBXEALnXVPjxjcwQYQjixt4Cakq37+vEzBdL8s3nMEFsPu6L59cqq6Obt3Oh1qnVZL+hY/mC9lZ33rXQsfCd7eQYm+aHKj/A8wqLK5BTtX6FRFDXaH+YdEwI8KS7eOFuwZjQ+LPphK8e4ZpfmVsZQIDAQAB', dataFactory.namedNode('http://www.w3.org/2001/XMLSchema#base64Binary')),
+          dataFactory.namedNode('ex://c5'),
+        ),
       ];
       await this.store.multiPut(this.quads);
     });
@@ -287,6 +293,14 @@
       should(quads).have.length(0);
     });
 
+    it('should retrieve base64 object', async function () {
+      const { dataFactory, store } = this;
+      const { items: quads } = await store.get({
+        subject: dataFactory.namedNode('ex://s5'),
+      });
+      should(quads).be.equalToQuadArray([this.quads[7]]);
+    });
+
   });
 
   describe('Quadstore.prototype.get() w/ order', () => {

I have a suite of "compliance" tests, like system tests, in which one sub-suite uses asymmetric cryptography to sign and verify messages between clones. That involves storing public keys in the data. All those tests fail because the public keys are corrupted when retrieved. That's what led me to write the quadstore test included in the ticket. No other tests fail – they whizz through!

In more raw terms, I write this:

Literal {
  termType: 'Literal',
  value: 'MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDFi56F11nqql9rT2OBNrYaEfBXEALnXVPjxjcwQYQjixt4Cakq37+vEzBdL8s3nMEFsPu6L59cqq6Obt3Oh1qnVZL+hY/mC9lZ33rXQsfCd7eQYm+aHKj/A8wqLK5BTtX6FRFDXaH+YdEwI8KS7eOFuwZjQ+LPphK8e4ZpfmVsZQIDAQAB',
  language: '',
  datatype: NamedNode {
    termType: 'NamedNode',
    value: 'http://www.w3.org/2001/XMLSchema#base64Binary'
  }
}

but get back this:

Literal {
  termType: 'Literal',
  value: 'd7eQYm+aHKj/A8wqLK5BTtX6FRFDXaH+YdEwI8KS7eOFuwZjQ+LPphK8e4ZpfmVsZQIDAQAB\x00\x00',
  language: '',
  datatype: NamedNode {
    termType: 'NamedNode',
    value: 'http://www.w3.org/2001/XMLSchema#base64Binary\x00\x00MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDFi56F11nqql9rT2OBNrYaEfBXEALnXVPjxjcwQYQjixt4Cakq37+vEzBdL8s3nMEFsPu6L59cqq6Obt3Oh1qnVZL+hY/mC9lZ33rXQs'
  }
}

I'd be more than happy to look into this myself, if you like!

Hello! Apologies for the latency, got married this weekend :). I'm still looking into this regression, for which we need specific tests. Before my break I did manage to discover that this seems to be related to any term that serializes to a string longer than a specific amount of chars. The type of the term itself appears to make no difference.

Here's two value arrays: the first one related to a breaking test case in which I'm serializing a literal having length of 128 chars, the second one related to a passing test case in which I'm serializing the same literal minus one character.

Uint16Array(17) [0, 15, 0, 15, 3, 49135, 189,  0,  15, 0, 0, 0, 0, 0, 0, 0, 0]
Uint16Array(16) [0, 15, 0, 15, 3,   127,   0, 15,   0, 0, 0, 0, 0, 0, 0, 0]

The array even changes in length (!!!). Weird.

This doesn't happen with the memory-level backend. Also, tests reproducing the serialization steps for a single term seem to pass with flying colors:

it('Should serialize and deserialize a single generic literal term having length of 2048', async function () {
      const { store: { dataFactory: factory }, prefixes } = this;
      const term = factory.literal(''.padStart(2048, 'abab'));
      const value = new Uint16Array(4);
      const serialized = termWriter.write(value, 0, term, prefixes);
      const value_8 = viewUint16ArrayAsUint8Array(value, 0, value.length);
      // I/O to/from backend here
      const value_16 = viewUint8ArrayAsUint16Array(value_8);
      const deserialized = termReader.read(serialized, 0, value_16, 0, factory, prefixes);
      should(deserialized.equals(term)).be.true();
    });

This also doesn't happen when reproducing the steps for (de)serializing a single quad:

it('Should serialize and deserialize a quad having terms longer than 127 chars', async function () {
      const { store: { dataFactory: factory, separator }, prefixes } = this;
      const quad = factory.quad(
        factory.namedNode('http://ex.com/s'),
        factory.namedNode('http://ex.com/p'),
        factory.literal(''.padStart(128, 'abab')),
        factory.namedNode('http://ex.com/g'),
      );
      const value = new Uint16Array(16);
      const prefix = `SPOG${separator}`;
      const termNames = ['subject', 'predicate', 'object', 'graph'];
      const serialized = quadWriter.write(prefix, value, 0, quad, termNames, prefixes);
      const value_8 = viewUint16ArrayAsUint8Array(value, 0, value.length);
      const value_16 = viewUint8ArrayAsUint16Array(value_8);
      const deserialized = quadReader.read(serialized, prefix.length, value_16, 0, termNames, factory, prefixes);
      console.log(deserialized);
      should(deserialized.equals(quad)).be.true();
    });

@gsvarovsky could you please test again with version 11.0.3-beta.1 which I've just published to NPM?

m-ld-js Unit tests all pass
m-ld-js Compliance tests all pass (and in record time)

awesome work. 🫡

All right! Will add some more tests and publish a patch version. Thank you for flagging this, it was a major regression.

Version 11.0.3 is now on NPM!