kriszyp / msgpackr

Ultra-fast MessagePack implementation with extension for record and structural cloning / msgpack.org[JavaScript/NodeJS]

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add support for more than 32 shared structures

bobsingor opened this issue · comments

I created a file that walks through all the possible messages that I have in a system that i'm building.
I expected the code below to create the complete structure but it's only generated for the first 32 message. the structure for messages from tp: 38 are not being generated.

here the exampleMessages from the file: https://www.npoint.io/docs/94e158745cdd8c21f296

import { exampleMessages } from './message/examples';
import * as fs from 'fs';
import { Packr } from 'msgpackr';

const packr = new Packr(
  {
    getStructures() {
      return JSON.parse(fs.readFileSync('./message/structure.json', 'utf8'));
    },
    saveStructures(structures) {
      console.log('create structures', structures);
      fs.writeFileSync('./message/structure.json', JSON.stringify(structures));
    }
  }
);

const main = async  () => {
  for (const exampleMessage of exampleMessages) {
    packr.encode(exampleMessage)
  }

  await new Promise((resolve, _reject) => {
    setTimeout(() => {
      resolve('finish timeout')
    }, 10000)
  })
}

main()

console.log('We created the structure for your messages');

I do see that it is by design that you created maxSharedStructures with a limit to 32.
what is this reason to limit it to 32? Can we make that optional?

Yes, it is by design. The idea is that data may have often have commonly used structures and may also have rarer structures or objects composed of more dynamic keys. And with a limit of shared structures, we can hopefully capture most of the commonly used structures while still being resilient to overloading the shared structures with rare/ad-hoc structures, which may not be reused (or reused infrequently).

This design is also motivated by how structures are encoded. When record structures are enabled and defined, msgpackr defines records with byte encodings that replace the byte encodings for the expansive (I think excessively) range of bytes used for positive integers (0-127). There are 128 some byte encodings available, and I didn't want to replace the most commonly used positive integers, 0-63, so 64 - 127 are used for records. Of these half (32) are allocated to shared use, and half reserved for use within individual encodings (once you have used up the shared ones, you still want ids available for use within individual data structures).

There are certainly more sophisticated techniques that could be used for differentiating between shared and private structures, but more sophisticating tracking would likely involve more code complexity and could reduce performance. This technique also benefits from being very deterministic.

That being said, probably one of the most straightforward and biggest improvements for situations where there are a lot of data structures that would benefit from being shared, would be to simply allow for two (or more) byte encodings of record ids, which would provide space for vastly more record ids and shared record structures.

Of course, your request to have more flexibility in potentially specifying more shared structures and/or use two byte encodings for more shared structures is quite reasonable, and something I did actually intend to implement at some point. So, I guess I will get to work on adding that :).

@kriszyp Thanks for the explanation. It would be amazing to be able to use more shared structures. I should have read the README better before implementing 😳. Is this something you are planning on implementing soon (is it worth the wait haha)?

I can probably get it done next week. Out of curiosity, about how many shared structures do you think you have?

That is great! At the moment it will be around 80. It might be more in the future but not more than 200

@kriszyp I see you did this commit. Is this already a working version? Or work in progress?

@kriszyp I see that you merged to master! I will test it. Can you release it to npm and tag it as beta npm publish --tag beta

Ok, it is published and ready for you to try out (the docs should explain usage). Let me know if you find any issues.

@kriszyp I started testing and ran into the following issue:

below code runs with 32 structures but fails with 33 structures. In the example code below you will see if you will remove one structure from the array it will run.

  • with the same Packr instance it will work but if I have 2 Packr instances it will not work with more than 32 structures.
  • with the same array somehow it works but if I have 2 different arrays with the same data it fails
import { Packr } from 'msgpackr';

const structures = [
  [ 'tp', 'pageNo', 'firstIndex', 'timestamp' ],
  [ 'tp', 'timestamp' ],
  [ 'tp', 'url', 'referrer', 'navigationStart' ],
  [ 'tp', 'width', 'height' ],
  [ 'tp', 'x', 'y' ],
  [ 'tp' ],
  [ 'tp', 'id', 'parentID', 'index', 'tag', 'svg' ],
  [ 'tp', 'id', 'parentID', 'index' ],
  [ 'tp', 'id' ],
  [ 'tp', 'id', 'name', 'value' ],
  [ 'tp', 'id', 'name' ],
  [ 'tp', 'id', 'data' ],
  [ 'tp', 'id', 'x', 'y' ],
  [ 'tp', 'id', 'label' ],
  [ 'tp', 'id', 'value', 'mask' ],
  [ 'tp', 'id', 'checked' ],
  [ 'tp', 'id', 'hesitationTime', 'label' ],
  [ 'tp', 'level', 'value' ],
  [ 'tp', 'speedIndex', 'visuallyComplete', 'timeToInteractive' ],
  [ 'tp', 'name', 'message', 'payload' ],
  [ 'tp', 'timestamp', 'source', 'name', 'message', 'payload' ],
  [ 'tp', 'name', 'payload' ],
  [ 'tp', 'key', 'value' ],
  [ 'tp', 'id', 'rule', 'index' ],
  [ 'tp', 'id', 'index' ],
  [ 'tp', 'name', 'duration', 'args', 'result' ],
  [ 'tp', 'type' ],
  [ 'tp', 'action', 'state', 'duration' ],
  [ 'tp', 'mutation', 'state' ],
  [ 'tp', 'type', 'payload' ],
  [ 'tp', 'operationKind', 'operationName', 'variables', 'response' ],
  [ 'tp', 'frames', 'ticks', 'totalJSHeapSize', 'usedJSHeapSize' ],
  [ 'tp', 'id', 'name', 'value', 'baseURL' ]
];

console.log(structures.length);

const structures2 = [...structures];

const packr = new Packr(
  {
    getStructures() {
      return structures
    },
    saveStructures(structures) {

    },
    maxSharedStructures: 100
  }
);

const packr2 = new Packr(
  {
    getStructures() {
      return structures2
    },
    saveStructures(structures) {

    },
    maxSharedStructures: 100
  }
);

const message1 = {
  tp: 60,
  id: 6,
  name: "href",
  value: "https://fonts.googleapis.com/css2?family=Open+Sans&display=swap",
  baseURL: "http://localhost:8080/"
};


const buffer = packr.pack(message1);
const message = packr2.decode(buffer);
console.log(message)

Good catch, published a fix in v1.4.0-beta4.

@kriszyp it is working great now! I so appreciate your efforts. Can I donate to you?

Great to hear! I'll put out a v1.4.0 after I have done some more testing with our app (I think it should be helpful in our app as well).

I don't need any donations/support. But as I have mentioned on some of my other projects, I'd always be delighted to inspire donations to evidence-based organizations helping those really in need (https://github.com/DoctorEvidence/lmdb-store#license).

Thanks again for the follow up, appreciate it!