ssbc / ssb-db2

A new database for secure-scuttlebutt

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Memory leak with v2.6.0

staltz opened this issue · comments

I tried to use ssb-db2 2.6.0 in Manyverse and experienced a consistent memory leak that ends up using all the memory and crashing the app.

I then pinned the version to 2.3.1 and no memory leak happened.

I'm not sure where exactly is the culprit, it could be related to bendy-butt additions (in version 2.4.0 and above) and/or transactions, or maybe it's caused by one of the dependencies (async-append-only-log or jitdb or something else). Update: actually when setting the ssb-db2 to 2.3.1, it also updated async-append-only-log to 3.1.0 (that's the latest) and jitdb to 3.4.0 (latest) and the mem leak didn't happen, so it has to be something about ssb-db2 specifically.

It can't be the transaction support as that is a new function that you don't use yet. So must be the bendy butt PR. Was this related to or was it just normal usage of manyverse?

I think it waa unrelated to that issue, and it was just normal usage of Manyverse. Just opened the app and without doing anything, in about 15sec it would crash.

Could it be something about scanning the log and rebuilding the author index? (Note the prefixOffset hack we did)

Could be yes

I went back to study this issue, and the symptoms are still the same with ssb-db2 2.8.3. The crash is clearly mentioning something about "out of memory" on both Android and desktop (Electron, Linux).

Here's a report from Electron DevTools:

Screenshot from 2021-12-22 13-10-45

  • JITDB's mergeFilters is at the very top
  • JITDB's matchAgainstPrefix is also expensive
  • The next anonymous function is the ssb-friends pull.collect which is expected to be heavy anyway since it loads the whole social graph into memory
  • The other anonymous function is JITDB's loadPrefixMapFile which is expected to be heavy too, since it does a JSON.parse on a large data
  • exports.reverse is in ssb-friends or layered-graph

So the big issue is unequivocally JITDB's mergeFilters. On the other hand, the previous JITDB version in Manyverse is 3.5.0 also has mergeFilters and matchAgainstPrefix but it doesn't have a memleak. Here is the git diff between jitdb 3.5.0 and 4.0.0: https://github.com/ssb-ngi-pointer/jitdb/compare/v.3.5.0..v4.0.0

The only thing that might have changed the logic is ssbc/jitdb@ssb-ngi-pointer:e004c1a...ssb-ngi-pointer:41a1b32#diff-e727e4bdf3657fd1d798edcd6b099d6e092f8573cba266154583a746bba0f346R850 which is now using bipf.compare.

That said, when I filed this issue (Oct 1), the latest version of jitdb was 3.4.0 which did not have bipf.compare in matchAgainstPrefix: https://github.com/ssb-ngi-pointer/jitdb/blob/e0744c187c193999e9c2f861ee8e6e9b547900a3/index.js#L856 and even back then there was a memleak somewhere.

I ran a couple other experiments:

  • I reverted the JITDB changes to matchAgainstPrefix (see red section here)
    • Still crashed
  • I suspected it has to do with the frequent usage (not implementation) of matchAgainstPrefix so I looked at which JITDB queries were running often, and from the debug logs I could see there were related to ssb-threads, so I turned off all Private tab queries, and Mentions tab queries
    • Still crashed
  • I turned off the Public tab queries, specifically, publicSummary pull-stream source was no longer pulled
    • No more memleak crashes

Of course, with the public tab query turned off, the app is useless, AND there are typically other computations that start when the public tab renders data (such as ssb-conn is kickstarted), but I currently suspect it has something to do with this ssb-db2 query: https://github.com/ssbc/ssb-threads/blob/22170c3fde66cbdd4f7615a3b94ab2871209b0c3/src/index.ts#L188-L193

(UPDATE: I turned on ssb-conn's "start" and it did not cause a memleak crash, so this confirms further that it may be about ssb-threads publicSummary)

It logs in the terminal usually like this:

  jitdb:query all(AND(*,SEQS(1371356))): 6ms, total messages: 1 +15ms
  jitdb:query paginate(AND(*,AND(OR(value_content_root(%Xrmwkht),value_content_fork(%Xrmwkht)),OR(value_content_type( post),value_content_type(8contact))))), seq: 0, limit: 15: 1352ms, total messages: 0 +1s

  jitdb:query all(AND(*,SEQS(1371219))): 2ms, total messages: 1 +26ms
  jitdb:query paginate(AND(*,AND(OR(value_content_root(%vrwiHED),value_content_fork(%vrwiHED)),OR(value_content_type( post),value_content_type(8contact))))), seq: 0, limit: 15: 1351ms, total messages: 1 +1s

  jitdb:query all(AND(*,SEQS(1371273))): 1ms, total messages: 1 +301ms
  jitdb:query paginate(AND(*,AND(OR(value_content_root(%RadDbNO),value_content_fork(%RadDbNO)),OR(value_content_type( post),value_content_type(8contact))))), seq: 0, limit: 15: 921ms, total messages: 0 +947ms

  jitdb:query all(AND(*,SEQS(1371301))): 2ms, total messages: 1 +51ms
  jitdb:query paginate(AND(*,AND(OR(value_content_root(%nRyWqKu),value_content_fork(%nRyWqKu)),OR(value_content_type( post),value_content_type(8contact))))), seq: 0, limit: 15: 951ms, total messages: 0 +1s

  jitdb:query all(AND(*,SEQS(1371265))): 2ms, total messages: 1 +37ms
  jitdb:query paginate(AND(*,AND(OR(value_content_root(%cgs6CeR),value_content_fork(%cgs6CeR)),OR(value_content_type( post),value_content_type(8contact))))), seq: 0, limit: 15: 928ms, total messages: 0 +948ms

  jitdb:query all(AND(*,SEQS(1371264))): 1ms, total messages: 1 +108ms
  jitdb:query paginate(AND(*,AND(OR(value_content_root(%0Ulq+Lh),value_content_fork(%0Ulq+Lh)),OR(value_content_type( post),value_content_type(8contact))))), seq: 0, limit: 15: 929ms, total messages: 0 +937ms

  jitdb:query all(AND(*,SEQS(1371290))): 2ms, total messages: 1 +214ms
  jitdb:query paginate(AND(*,AND(OR(value_content_root(%oZTNlUm),value_content_fork(%oZTNlUm)),OR(value_content_type( post),value_content_type(8contact))))), seq: 0, limit: 15: 2231ms, total messages: 3 +2s

  jitdb:query all(AND(*,SEQS(1371246))): 1ms, total messages: 1 +465ms
  jitdb:query paginate(AND(*,AND(OR(value_content_root(%3QsVgTQ),value_content_fork(%3QsVgTQ)),OR(value_content_type( post),value_content_type(8contact))))), seq: 0, limit: 15: 1116ms, total messages: 0 +1s

  jitdb:query all(AND(*,SEQS(1371263))): 2ms, total messages: 1 +2s
  jitdb:query paginate(AND(*,AND(OR(value_content_root(%Lc98Jti),value_content_fork(%Lc98Jti)),OR(value_content_type( post),value_content_type(8contact))))), seq: 0, limit: 15: 878ms, total messages: 0 +925ms

  jitdb:query all(AND(*,SEQS(1370941))): 3ms, total messages: 1 +42ms
  jitdb:query paginate(AND(*,AND(OR(value_content_root(%lDlTx1g),value_content_fork(%lDlTx1g)),OR(value_content_type( post),value_content_type(8contact))))), seq: 0, limit: 15: 889ms, total messages: 2 +893ms

  jitdb:query all(AND(*,SEQS(1371057))): 3ms, total messages: 1 +3s
  jitdb:query paginate(AND(*,AND(OR(value_content_root(%XxTlFZX),value_content_fork(%XxTlFZX)),OR(value_content_type( post),value_content_type(8contact))))), seq: 0, limit: 15: 979ms, total messages: 0 +983ms

<--- Last few GCs --->

[467783:0xb880042c000]    63018 ms: Scavenge 1911.5 (1964.8) -> 1909.6 (1964.8) MB, 1.8 / 0.0 ms  (average mu = 0.265, current mu = 0.182) task 
[467783:0xb880042c000]    63038 ms: Scavenge 1911.5 (1964.8) -> 1909.6 (1964.8) MB, 1.4 / 0.0 ms  (average mu = 0.265, current mu = 0.182) allocation failure 
[467783:0xb880042c000]    63093 ms: Scavenge 1911.4 (1964.8) -> 1909.7 (1964.8) MB, 1.7 / 0.0 ms  (average mu = 0.265, current mu = 0.182) task 


<--- JS stacktrace --->

[467783:1222/162520.851176:ERROR:node_bindings.cc(143)] 
Fatal error in V8: Ineffective mark-compacts near heap limit 
Allocation failed - JavaScript heap out of memory

OMG I think I figured it out! GOSH programming is so painful sometimes.

I'm happy I made the thorough analysis above, because it helped narrow down that it may be in publicSummary and it has something to do with the new code shown in the diff between the ssb-db2 versions.

So I took a careful look at the changes to the operators, like type:

//                   | THE OPTS ARG IS NEW STUFF
//                   |
//                   v
function type(value, opts = { dedicated: true }) {
  if (opts && opts.dedicated) {
    return equal(seekType, value, {
      indexType: 'value_content_type',
    })
  } else {
    return equal(seekType, value, {
      prefix: 32,
      indexType: 'value_content_type',
    })
  }
}

And then makeFilterOperator in ssb-threads:

function makeFilterOperator(opts: FilterOpts): any {
  if (opts.allowlist) {
//                                      | THIS NASTY .map
//                                      |
//                                      v
    const allowedTypes = opts.allowlist.map(type);
    return or(...allowedTypes);
  }
  if (opts.blocklist) {
    const blockedTypes = opts.blocklist.map((x) => not(type(x)));
    return and(...blockedTypes);
  }
  return null;
}

The .map calls the type() operator with type(str, i, thisObj) and then we're passing i as the operator opts, which ends up creating a prefix operator, which is not what we want.

Turned out I just changed

-   const allowedTypes = opts.allowlist.map(type);
+   const allowedTypes = opts.allowlist.map(x => type(x));

And no more memory leak crash.

SO PAINFUL.

Wow, nice debugging! :) That should things overall also smoother I would imagine.

Closing this issue because it's an ssb-threads issue and I solved it there already. Thanks for reading/following, I know it probably took some non-zero minutes of your attention.