How to defrag?

Question

How to defrag?

staltz opened this issue 2 years ago · comments

I've been thinking about getting back to implementing partial replication (finishing ssbc/ssb-replication-scheduler#5), and putting it in Manyverse.

TL;DR how can we keep the sizes of log + indexes somewhat constant/stable as M new messages come in and M older messages get overwritten by zero bytes?

Deleting messages from the log and defragmenting will be an important part, because initially the use of partial replication would add N index messages to the log, and N could be significant (it's at least the number of votes/posts/contacts/abouts), which could bring the log size from the current ~1GB to 2GB or more. Defragging is important because if we just overwrite the "deleted" records with zero bytes, this still doesn't change the total size of the log, and doesn't change the offsets in the indexes, and there's a limit of ~4.3GB for the max offset that indexes can point to (due to uint32).

I thought about using sparse files for the log, or compressing the log on writes and decompressing on reads, but those would still not fix the problem which is about the uint32 limit for offsets in the indexes, not about the log.

So we would need to defrag the log itself, which means also that all indexes have to be rebuilt to match the new offsets for each record. How often to do this defrag+reindex would be tricky, and the waiting experience would be pretty bad for users.

One idea is to upgrade the offsets from uint32 to uint64 which would increase the "lifespan" of the log by 4 million times. Assuming it takes 1 year to get 1GB worth of messages on the average user's log, then we would have millions of years, so uint64 is well enough.

Perhaps if offset variables are updated to uint64, and if we use compression so that the log file itself can shrink the space taken by those zero bytes, then this problem might be "solved". But I'm not sure about that, because index sizes could still keep growing.

Ideally (just to make sure our system works at scale), we should assume that an insane amount of new messages come in via replication (say, +1GB on the log every month. This is much more that the previous mentioned +1GB/year, but the "lifespan" of the log with uint64 would still be ridiculous, about 350k years), and that an equal amount of older messages gets "deleted" (overwritten by zero bytes), and how to keep the user's disk usage somewhat stable. I mean, it's acceptable if the disk usage has a sawtooth pattern (like garbage collection sawtooth charts), as long as an upper bound is never crossed, i.e. no "disk" leaks happen.

Anders Rune Jensen · Answer 1 · Wed Jan 19 2022 22:31:17 GMT+0800 (China Standard Time)

Some loose thoughts (I'll need to spend more time on it).

uint64 is one option. It would be good if we didn't need to use uint64 for the arrays in jitdb. That takes up double the space, but then again it might be good if we didn't have to rewrite the whole index file on changes, only what has changed.

As for the delete and compaction, it might be possible to do something like keeping track of the latest point on the log where everything before that is deleted. Then you could store that as an starting point or "deleted offset" and once you recalibrate seqs related to this you could set it to 0.

André Staltz · Answer 2 · Wed Jan 19 2022 22:49:47 GMT+0800 (China Standard Time)

As for the delete and compaction, it might be possible to do something like keeping track of the latest point on the log where everything before that is deleted. Then you could store that as an starting point or "deleted offset" and once you recalibrate seqs related to this you could set it to 0.

This is smart! But would it ever work? I mean, how can we guarantee that the 1st X messages would be eventually deleted? It might be that those messages are authored by the root metafeed, or are contact messages, and in that case they wouldn't ever be deleted.

Anders Rune Jensen · Answer 3 · Thu Jan 20 2022 03:52:44 GMT+0800 (China Standard Time)

Good point. Another idea would be to have multiple databases, one for things that are semi stable and one for "fast moving" stuff. The second with most of the churn might even be migrated to a fresh database once in a while to keep it small. Imagine a sliding window that can grow to 2x and then migrates to a new database at ½x or 1x. Anyway just throwing ideas out there :)

André Staltz · Answer 4 · Sun Jan 23 2022 02:46:54 GMT+0800 (China Standard Time)

That's a pretty good idea, to have multiple databases. It reminds me of a couple other ideas from the distant past:

One log per feed (Dominic ended up convincing me this was a bad idea)
Buffer log (I forgot the term, but Matt Mckegg came up with this idea of replicating new content into a temporary log that is not indexed or processed in any way, it's just stored in disk, and then the actual log will gently "sip" the buffer log as it sees fit)

Your idea also gave me another idea, to have a throw-away temporary log (and indexes), so that there would be three:

Stable db
- Nothing is deleted, or if it is, then it's not defragged
Fast moving db
- FIFO
- Always deletes (overrides with zeros) the first records until the log's (virtual) size is at most n MB
- Indexes keep track of the moving deleted region
Temp db
- On app startup, is always empty
- Always stored in memory, never persisted in disk
- Can replicate like usual, can build indexes like usual

The reason for the third, temp db, is that you could browse the profiles of acquaintances and strangers and just replicate them on the spot (first you just replicate their profiles, which is a tiny amount of data, then you can choose to scroll their posts, either on the temp db or on the other persisted DBs). It would reduce the burden on the storage needs from the other two DB types, and would support the browsing usage whenever you are currently connected to other peers.

I think the real challenge would be to weave together all these three in the same ssb-db2 API. Ideally you would want to Read the database in an agnostic way, not caring from which of the three it came from, but when you Write to the database, you would want to choose where should it go. The indexing part should then weave together the indexes from all the three types.

Dominic Tarr · Answer 5 · Wed Feb 09 2022 03:13:37 GMT+0800 (China Standard Time)

migrating to u64 is the simplest, but it has the downside of doubling the size of things that are already u32...

varint means you cant just jump to an byte offset and know you have a valid record.

but what about this: current u32 data stays.
once you hit a threashold, switch to u64-4. (u64 minus 4)
a u64-4 value is 2x u32-2's, the first one starts with 11 bits, that means there is another u32-2 after it.
the next one has 10. if a u32 value is not 11... or 10... then it's a regular u32. if it starts with 11 combine it with the next u32 to make a u64, and if it starts with 10 combine it with the previous. getting started implementing this would be as simple as just preventing u32 with a leading 1 being written (it's not too late for that, right?) although 01... is still okay.

with a varint, a similar pattern is used but a single bit indicates there is another byte, so to randomly seek into series of varints in memory, you'd need to scan backwards to the previous end byte, 0... then take the subsequent zero or more bytes that start with 1 until one with a leading zero. (1...)*(0...)

Dominic Tarr · Answer 6 · Wed Feb 09 2022 03:22:32 GMT+0800 (China Standard Time)

how does jitdb use arrays? maybe there can be something like a page level where it indicates wether it's using u64 or u32 for that page?

did it use bitfields in the end? if so, is there a thing that maps sequential indexes back to byte offsets? sequential indexes are better for bitfields, but of course you need to get the actual byte offset somehow. but you only need to switch, at some point, to using u64, and it always grows, so there is a point where that change happens.

Dominic Tarr · Answer 7 · Wed Feb 09 2022 06:49:57 GMT+0800 (China Standard Time)

if you really want to compact the database, a simple way to do that would be to recopy the log into a new log, minus the deleted stuff, rebuild the indexes, then drop the old log. this is basically how couch db does it. since scanning the log and building indexes is fast, it wouldn't take so much, and, you could do it slowly in the background so it wasn't noticable.
you could gaurantee that it only used between N and 2N bytes, but you'd have to wait if a write was gonna take you over 2N while the compaction is happening. Much simpler than reengineering everything

Dominic Tarr · Answer 8 · Thu Feb 10 2022 12:23:40 GMT+0800 (China Standard Time)

I had another idea today, that improves my above suggestion from 2N to just use max N space, but 2N index memory.

Instead of writing a separate second log, superimpose the compacted log on top of the old one, keeping track of the min uncompacted sequence number. When you do a query from the old indexes, drop results with offsets smaller than the uncompacted sequence, and mix with results from the new indexes. merge index responses by sequence numbers, so numbers smaller than the uncompacted sequence are from the compacted section and the numbers greater are from the uncompacted log.

I'm not sure that this optimization is really necessary, but might not be that hard to pull off once you have the first part.

André Staltz · Answer 9 · Thu Feb 10 2022 15:45:16 GMT+0800 (China Standard Time)

That's great! I was already getting used to the idea that we have to compromise with the 2N limit for logs, so getting it down to N is much better. Indexes at 2N is fine, because they are typically much smaller than the log. (Example from production: log is 960MB, indexes are in total 262MB)

UPDATE: wait, why would indexes be at 2N? Couldn't we wait for the superimposition writes to finish and then clear all indexes are rewrite? Or were you suggesting that the compacted section of the log would also have its own indexes being written simultaneously?

What if instead of dropping results from the uncompacted section, we drop results from the compacted section? I mean, the old indexes will already be coherent with the uncompacted section, plus most queries are for recent content anyways (thus likely to be in the latter section of the log).

By the way, I still believe in having an in-memory ephemeral database, a good use case being to fetch the profile details of a feed you block. You don't want to actually "replicate" that feed (using your storage and forwarding it to others), you just want to know what the name of the blocked person is, and maybe one of the peers currently connected has that information. (PS: the assumption here is that once you block someone, we'll delete their records from the log)

André Staltz · Answer 10 · Thu Feb 10 2022 16:03:27 GMT+0800 (China Standard Time)

migrating to u64 is the simplest, but it has the downside of doubling the size of things that are already u32...

I agree it's a pretty simple change, on the other hand, I kind of like the idea of SSB being always for "small" use cases, and by small I'm exaggerating here because 4.3GB is enough for roughly 20 years of content from thousands of people. That should be enough for most cases. Use cases that fall outside of that would be: archival and "Corporate Big Data Or Surveillance Capitalism".

did it use bitfields in the end? if so, is there a thing that maps sequential indexes back to byte offsets? sequential indexes are better for bitfields, but of course you need to get the actual byte offset somehow

Yes jitdb uses bitfields. Not sure if I understood how to map your terminology to our terminology, but it goes like this: we have bitvectors where the Nth bit is about the Nth record on the log, and then we have an index called seq.index where the Nth entry tells us the byte offset of the Nth record on the log.

jitdb also has some other esoteric indexes, like "prefix indexes", which allow us to store (e.g.) the first 4 bytes of a msg.key for all messages, and this is very often "unique enough", we don't need the full msg.key to guarantee uniqueness. Of course we do a double check for false positives in case it's not unique.

And we also have "prefix maps" which is basically a huge object that looks like this:

{
  '%Lihv': [seq1, seq8, seq185, seq280],
  '%W1iI': [seq6, seq59, seq127, seq305, seq407, seq558],
  '%rMcC': [seq3, seq67, seq111, seq189, seq198]
}

Which we use (e.g.) to find all messages that point to a certain root message, in threads.

Dominic Tarr · Answer 11 · Thu Feb 10 2022 16:16:14 GMT+0800 (China Standard Time)

is the prefix map a flumeview-reduce? one of my main ideas with flumeview-vectors was to create exactly this data structure, but on bytes, so you didn't have to load everything.

another idea: if the seq.index was paged, you could have a u64 absolute offset for the start of the page, then an array of u32 that were relative to that. you would only need to read a page at a time, and you'd get up to 4 billion messages which is much more than a 4 gb database, about 700 times that being the average message size if I recall. If the other indexes went through seq.index there is very little you'd need to change.

Dominic Tarr · Answer 12 · Thu Feb 10 2022 16:27:53 GMT+0800 (China Standard Time)

UPDATE: wait, why would indexes be at 2N?

I was thinking that you'd perform the compaction process gracefully in the background, without interrupting usability. So you'd keep the old indexes, but gradually build a new set too. That way you can do it slowly without making the phone hot! If you were prepared to make the user look at a progress bar for a while, then you could indeed have max N size indexes.

Maybe you could perform a similar in place update to the indexes, but there are lots of different kinds of indexes so I wouldn't assume so in general

What if instead of dropping results from the uncompacted section, we drop results from the compacted section?

this doesn't make sense. the messages in the compacted section now have different offsets, from filling the gaps created by deleted messages. So the old (uncompacted) indexes do not point to the start of messages in the compacted section.

What I meant is in the old (uncompacted) indexes, drop results that point into the compacted section. You'd still keep results from the uncompacted section that point to after where compaction is up to.

André Staltz · Answer 13 · Thu Feb 10 2022 16:53:50 GMT+0800 (China Standard Time)

is the prefix map a flumeview-reduce? one of my main ideas with flumeview-vectors was to create exactly this data structure, but on bytes, so you didn't have to load everything.

It's pretty brutal at the moment. It's just a big JSON object that's stringified / parsed, and the whole object is kept in memory.

André Staltz · Answer 14 · Fri Feb 11 2022 00:10:42 GMT+0800 (China Standard Time)

@moid's contribution to this conversation is:

You may find some of the ideas in bitcask useful, particularly the use of hint files and merges. It's very solid, and the ideas simple enough to readily implement in other languages.

André Staltz · Answer 15 · Mon Mar 07 2022 17:04:57 GMT+0800 (China Standard Time)

@arj03 @dominictarr After some weeks have passed, have you had any new thoughts about database compactions? I intend to start working on it this week and now would be the best time to make comments about the big picture before code is changed.

My plan is to initially support only one non-controversial use case: given that you have blocked a peer, you can "delete" (overwrite with zero bytes) their messages from the log, and then we would like to compact the database afterwards. So far I won't do anything related to deleting portions of a friend's feed, as that would require extra caution.

André Staltz · Answer 16 · Wed Apr 20 2022 18:09:15 GMT+0800 (China Standard Time)

Some benchmark that I want to keep somewhere, so here:

num msgs	log size (bytes)	time to compact	seq.index size (bytes)
1000	458752 => 262144	39ms	4416 => 4416
10000	4063232 => 2031616	185ms	44016 => 44016
100000	40697856 => 20381696	568ms	440016 => 440016
prod	prod	13s — 43s	prod

Anders Rune Jensen · Answer 17 · Wed Apr 20 2022 18:13:41 GMT+0800 (China Standard Time)

Are you removing half of the messages? At random? And for production does the "time to compact" include indexes reindexing?

André Staltz · Answer 18 · Wed Apr 20 2022 18:48:03 GMT+0800 (China Standard Time)

Oh yeah, forgot to add any context to this. It's running in a test suite, and I remove half of the messages, every odd numbered message is deleted. So not at random. "Time to compact" does not include reindexing time. The prod numbers are just estimates extrapolated from the 3 first rows.

Anders Rune Jensen · Answer 19 · Wed Apr 20 2022 19:14:51 GMT+0800 (China Standard Time)

Okay, I thought prod was where you ran this on a copy of your manyverse desktop database :)

André Staltz · Answer 20 · Wed Apr 20 2022 21:00:45 GMT+0800 (China Standard Time)

I ran it for real in my manyverse desktop db, like this:

Ran deleteFeed for mixmix, dominic, and Sami (all these three have a lot of message)
Ran sbot.db.compact()

Results:

sizeDiff 51642368 bytes (about 51 MB)
compact: 30.978s

Reindexing took 12min but I suspect that it's not functionally correct, I have a suspicion that it was rebuilding leveldb indexes too soon, or concurrently. I have to check if the code is correct. So there's still hope that reindexing would drop down to 8min or less.

André Staltz · Answer 21 · Tue Jun 21 2022 23:57:38 GMT+0800 (China Standard Time)

@arj03 I'm working on a module that deletes feeds that are blocked, and I got some stats to share with you.

In Manyverse, I put a script which looks at ssb.friends.hops and deletes any feed that has a negative distance. (Sequentially! One deleteFeed at a time). Then, it compacts the db and rebuilds indexes.

Before

➜ du -h -d 2
100M    ./ebt
4,0K    ./blobs/tmp
3,9G    ./blobs/sha256
3,9G    ./blobs
48K     ./blobs_push
45M     ./db2/jit
228M    ./db2/indexes
1,3G    ./db2
5,2G    .

➜ ls -la log.bipf
-rw-rw-r--  1 staltz staltz 1039335424 jun 21 16:09 log.bipf

During

1177 feeds deleted
Total time to delete: 39.567 seconds
compact duration: 2m10s
reindex duration: I didn't measure but it seemed like >=15min

How much time it took for each feed to be deleted, distribution chart where x-axis is each feed, and y-axis is time-to-delete in milliseconds:

After

➜  du -h -d 2
100M    ./ebt
4,0K    ./blobs/tmp
3,9G    ./blobs/sha256
3,9G    ./blobs
48K     ./blobs_push
42M     ./db2/jit
303M    ./db2/indexes
1,3G    ./db2
5,2G    .

➜ ls -la log.bipf
-rw-rw-r--  1 staltz staltz 968622080 jun 21 16:30 log.bipf

Notice that the log size went down 70 MB, which is pretty cool.

What's puzzling me is that it seems ./db2/indexes grew in size, and I got a new entry somewhere in my Activity (mentions or follows) tab in Manyverse. I was certainly running all of this offline, so there could not have been any new messages replicated.

Anders Rune Jensen · Answer 22 · Wed Jun 22 2022 16:22:27 GMT+0800 (China Standard Time)

That is strange. Did you save your log.bipf from before compaction? It could be interesting to do a full reindex on that database nad see if the same new message appears. It should. And if it does then the must have been something wrong with the old indexes. Do you have a more detailed list of the index sizes?

Anders Rune Jensen · Answer 23 · Wed Jun 22 2022 16:40:19 GMT+0800 (China Standard Time)

Also not sure if you have updated bipf yet, it should provide a nice speed boost for indexing (around 10% faster)

André Staltz · Answer 24 · Thu Jun 23 2022 04:56:37 GMT+0800 (China Standard Time)

I'll give another serious run of these numbers again after the bugs in ssb-db2 are fixed, so we can rule out at least some weirdness.

Also not sure if you have updated bipf yet, it should provide a nice speed boost for indexing (around 10% faster)

It was using bipf 1.6.5, which is the version with your fast-varint, but before we added markIdempotent etc.

André Staltz · Answer 25 · Mon Jun 27 2022 23:41:07 GMT+0800 (China Standard Time)

I ran delete (of ~1100 blocked feeds) and compaction again, on my production database, and I noticed a couple of funny things.

In the table below, Before shows sizes of things (in bytes) before deletion and compaction. After is after compaction and reindexing induced by compaction. And then, I deleted indexes and jit and restarted the app, this is what After (2) represents.

I think there is some bug somewhere, because After !== After (2). There were a few other oddities (like some indexes got bigger, and there was a weird progress bar thing behavior as well) that made me suspect that some reindexing logic is broken somewhere. Because these are macro measurements, I have no idea where the bug is but I'll try to debug deeper with a smaller db.

	Before	After	After (2)
Total db2	1 325 304 727	1 315 927 077	1 243 666 357
log.bipf	1 040 318 464	969 605 120	969 605 120
base	1 254 616	1 387 965	5 016 782
keys	85 970 974	78 777 999	78 932 201
fullMentions	5 360 982	6 487 349	5 500 828
hashtags	5 156 923	10 740 664	5 859 976
ebt	22 309 727	31 085 515	22 784 346
contacts	8 363 329	10 893 358	7 085 613
aboutSelf	2 125 628	3 483 039	3 814 622
search2	107 887 805	159 513 868	101 749 694
meta_.index	180 456	180 456	166 472
meta_private_true.index	180 452	180 452	166 468
seq.index	6 220 244	5 859 214	5 859 214
sequence.index	6 220 244	5 859 214	5 859 214
timestamp.index	12 440 472	11 718 412	11 718 412
value_author.32prefix	6 219 816	5 859 214	5 859 214
value_author_@QlCT...	180 456	180 456	166 472
value_content about__map.32prefixmap	446 618	436 899	431 884
value_content contact__map.32prefixmap	2 751 773	2 314 543	2 186 264
value_content fork__map.32prefixmap	234 112	220 895	215 293
value_content root_.index	180 456	180 456	166 472
value_content root__map.32prefixmap	3 154 090	3 034 062	2 997 926
value_content type_about.index	180 448	180 448	166 464
value_content type_contact.index	180 452	180 452	166 468
value_content type_gathering.index	179 704	179 704	165 720
value_content type_post.index	180 456	180 456	166 472
value_content type_pub.index	180 448	180 448	166 464
value_content type_roomx2Falias.index	180 152	180 152	166 168
value_content type_vote.index	180 456	180 456	166 472
value_content vote_link__map.32prefixmap	6 668 704	6 449 767	6 323 262

Anders Rune Jensen · Answer 26 · Tue Jun 28 2022 03:46:35 GMT+0800 (China Standard Time)

Strange. Those numbers are all over the place. I don't remember if there is a test where you use fixture to generate a database with say 10 users, delete 5 of them and check that all indexes return the correct number of results?

André Staltz · Answer 27 · Tue Jun 28 2022 03:54:10 GMT+0800 (China Standard Time)

That might indeed be a good test. In the micro scale there are tests (for 1 author I guess?) and they pass. The problem seems to happen somewhere in medium or large scale, no idea why.

André Staltz · Answer 28 · Tue Jun 28 2022 17:12:00 GMT+0800 (China Standard Time)

One thing I suspected was that I wasn't "flushing deletes" after calling all those deleteFeed(). So I added ssb.db.getLog().onDeletesFlushed(() => { before running compact, but that didn't solve the problem.

Here are stats for running on ssb-fixtures, seed=foo, messages=10000, authors=200.

Before

log.bipf: 8 192 000

  883 042 ./base
  593 846 ./keys
  189 258 ./fullMentions
   26 310 ./hashtags
  703 007 ./ebt
   22 613 ./contacts
   47 561 ./aboutSelf
1 238 308 ./search2

  1 268 jun 28 12:03  meta_.index
  1 260 jun 28 12:03  meta_private_true.index
 44 016 jun 28 12:03  seq.index
 44 016 jun 28 12:03  sequence.index
 88 016 jun 28 12:03  timestamp.index
 44 016 jun 28 12:03  value_author.32prefix
  1 268 jun 28 12:03  value_author_@TWKY...
  2 507 jun 28 12:03  value_content_about__map.32prefixmap
  6 930 jun 28 12:03  value_content_contact__map.32prefixmap
  8 089 jun 28 12:03  value_content_fork__map.32prefixmap
  1 268 jun 28 12:03  value_content_root_.index
 19 701 jun 28 12:03  value_content_root__map.32prefixmap
  1 268 jun 28 12:03  value_content_type_about.index
  1 264 jun 28 12:03  value_content_type_contact.index
  1 268 jun 28 12:03  value_content_type_post.index
  1 268 jun 28 12:03  value_content_type_vote.index
 40 440 jun 28 12:03  value_content_vote_link__map.32prefixmap

After

log.bipf: 6 946 816

  984 245 ./base
1 809 790 ./keys
  449 298 ./fullMentions
   83 925 ./hashtags
1 587 155 ./ebt
   29 485 ./contacts
   68 052 ./aboutSelf
2 182 394 ./search2

  1 268 jun 28 12:05  meta_.index
  1 260 jun 28 12:05  meta_private_true.index
 37 376 jun 28 12:05  seq.index
 37 376 jun 28 12:05  sequence.index
 74 736 jun 28 12:05  timestamp.index
 37 376 jun 28 12:05  value_author.32prefix
  1 268 jun 28 12:05  value_author_@TWKY...
  2 376 jun 28 12:05  value_content_about__map.32prefixmap
  6 344 jun 28 12:05  value_content_contact__map.32prefixmap
  7 500 jun 28 12:05  value_content_fork__map.32prefixmap
  1 268 jun 28 12:05  value_content_root_.index
 18 384 jun 28 12:05  value_content_root__map.32prefixmap
  1 268 jun 28 12:05  value_content_type_about.index
  1 264 jun 28 12:05  value_content_type_contact.index
  1 268 jun 28 12:05  value_content_type_post.index
  1 268 jun 28 12:05  value_content_type_vote.index
 37 180 jun 28 12:05  value_content_vote_link__map.32prefixmap

Anders Rune Jensen · Answer 29 · Tue Jun 28 2022 18:13:47 GMT+0800 (China Standard Time)

All the jitdb indexes should be lower or same size. I think for level there might be some compaction that hasn't kicked in. So the best would be to run queries on the results to see if things looks correct.

André Staltz · Answer 30 · Tue Jun 28 2022 21:56:11 GMT+0800 (China Standard Time)

Ok, I hacked ssb-db2 code in Manyverse to force level.compactRange and got these numbers:

log.bipf: same

 27 929  ./base
504 590  ./keys
 67 588  ./fullMentions
 20 261  ./hashtags
168 152  ./ebt
 10 510  ./contacts
 18 945  ./aboutSelf
627 026  ./search2

jitdb: same

Now it looks much better. Doesn't seem like there is a problem. I also wrote an ssb-db2 test (which I could make a PR for) and couldn't notice any problem.

André Staltz · Answer 31 · Tue Jun 28 2022 23:14:19 GMT+0800 (China Standard Time)

Turns out that ssb.db.getLog().onDeletesFlushed(() => { is unnecessary because AAOL does that already when you're compacting.

André Staltz · Answer 32 · Fri Jul 01 2022 05:02:10 GMT+0800 (China Standard Time)

More numbers after using the latest jitdb with a single log.stream:

Initial indexing: jitdb log.stream #978 ended, scanned 10000 records in 2835ms
Post-compaction indexing: jitdb log.stream #301 ended, scanned 8491 records in 5695ms

(In both of these cases, the log.stream was updating/creating many indexes at once)

I have one more suspicion to check tomorrow: live jitdb queries are interfering with something during post-compaction indexing. They seem to kick in only after initial indexing is done.

UPDATE: actually, numbers vary drastically. Here's for a second run:

Initial jitdb indexing: jitdb log.stream #477 ended, scanned 10000 records in 1970ms
Post-compaction jitdb indexing: jitdb log.stream #413 ended, scanned 8491 records in 601ms

But the total indexing time (jitdb PLUS level indexes) is still consistently larger for the post-compaction one.

If you want to see the full log:

Click to see all

  jitdb indexes path /home/staltz/.config/manyverse/ssb/db2/jit +0ms
indexing done at 1656622946483
  jitdb loaded indexes [] +61ms
  jitdb log.stream #477 started, updating indexes seq|value_author|value_author_@TWKY4Bq5ezVqbXV2D7NyODxCXgu8o4rgpx2Fsf1GdHbCw=x2Eed25519|value_content_type_about|value_content_about__map|value_content_type_contact|value_content_contact__map|value_content_fork__map|value_content_root_|value_content_root__map|value_content_type_gathering|value_content_type_post|value_content_type_pub|value_content_type_roomx2Falias|value_content_type_vote|value_content_vote_link__map|meta_|meta_private_true +4ms
indexing at 2.6% at 1656622947532
  jitdb log.stream #477 ended, scanned 10000 records in 1970ms +2s
  jitdb saving core index: seq +1ms
  jitdb saving core index: timestamp +3ms
  jitdb saving core index: sequence +2ms
  jitdb saving index: value_author +15ms
  jitdb saving prefix index: value_author +1ms
  jitdb saving index: value_author_@TWKY4Bq5ezVqbXV2D7NyODxCXgu8o4rgpx2Fsf1GdHbCw=x2Eed25519 +0ms
  jitdb saving index: value_author_@TWKY4Bq5ezVqbXV2D7NyODxCXgu8o4rgpx2Fsf1GdHbCw=x2Eed25519 +0ms
  jitdb saving index: value_content_type_about +1ms
  jitdb saving index: value_content_type_about +0ms
  jitdb saving index: value_content_about__map +0ms
  jitdb saving prefix map index: value_content_about__map +0ms
  jitdb saving index: value_content_type_contact +1ms
  jitdb saving index: value_content_type_contact +0ms
  jitdb saving index: value_content_contact__map +0ms
  jitdb saving prefix map index: value_content_contact__map +0ms
  jitdb saving index: value_content_fork__map +0ms
  jitdb saving prefix map index: value_content_fork__map +0ms
  jitdb saving index: value_content_root_ +1ms
  jitdb saving index: value_content_root_ +0ms
  jitdb saving index: value_content_root__map +0ms
  jitdb saving prefix map index: value_content_root__map +1ms
  jitdb saving index: value_content_type_gathering +0ms
  jitdb saving index: value_content_type_post +0ms
  jitdb saving index: value_content_type_post +1ms
  jitdb saving index: value_content_type_pub +0ms
  jitdb saving index: value_content_type_roomx2Falias +0ms
  jitdb saving index: value_content_type_vote +0ms
  jitdb saving index: value_content_type_vote +0ms
  jitdb saving index: value_content_vote_link__map +0ms
  jitdb saving prefix map index: value_content_vote_link__map +0ms
  jitdb saving index: meta_ +2ms
  jitdb saving index: meta_ +1ms
  jitdb saving index: meta_private_true +0ms
  jitdb saving index: meta_private_true +1ms
  jitdb log.stream #675 started, for a live query +710ms
  jitdb log.stream #988 started, for a live query +9ms
  jitdb log.stream #582 started, for a live query +7ms
  jitdb log.stream #954 started, for a live query +28ms
  jitdb log.stream #360 started, for a live query +123ms
(node:821052) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.
(Use `electron --trace-deprecation ...` to show where the warning was created)
indexing done at 1656622949532
  jitdb log.stream #990 started, for a live query +199ms
  jitdb log.stream #189 started, for a live query +119ms
  jitdb log.stream #246 started, for a live query +85ms
  jitdb log.stream #423 started, for a live query +186ms
  jitdb log.stream #250 started, for a live query +124ms
  jitdb log.stream #855 started, for a live query +109ms
  jitdb log.stream #135 started, for a live query +235ms
  jitdb log.stream #877 started, for a live query +338ms
  jitdb log.stream #209 started, for a live query +210ms
  jitdb log.stream #777 started, for a live query +156ms
deleted @hBrB73m2YXgAbS3JWt8mTUWW1aQaqTrMiixopFcyEFQ=.ed25519: 7.87ms
deleted @gXMTLcuGIfotbJG64A/ejlAfqtcdOYnlLrpSvRJ1oW0=.ed25519: 10.536ms
deleted @wa5j6AuCiqrTCbnGtWubtPjP97rXBEC58RJylKbrxd4=.ed25519: 18.098ms
deleted @U9j/wzB/KDdyRdPlmzic0FviBZ2o7H3u5v9yrUbGBPo=.ed25519: 1.953ms
deleted @i6jzvSo/kCMIW6DTdOk64twnWEP5QIMmzcqf00ofelk=.ed25519: 1.409ms
deleted @c4h9pIBNEhAoIBrVg2cpVyaBsSAPCwBq3JF6/qU4/iY=.ed25519: 1.518ms
deleted @g2K2im02oS0E3wEI2YC/SwznirH7JvaKBB1dPpXebU0=.ed25519: 1.888ms
deleted @w0gPonBuHL6EO8yYjZnMon86i6Uu5zYnNApLc+udPuE=.ed25519: 1.575ms
deleted @9N8mY4sFS+kkymLdrsrkHcw9U15bJoX5gSxB9FtwHEY=.ed25519: 1.507ms
deleted @pbIsHzfufo6/bLo6XeTeCk390l/KSRiLAi3yb6gjmwQ=.ed25519: 2.056ms
deleted @T4oov++qtPrjZVp2zO8cNG4kmZs1SiMRT4FLkBpv13c=.ed25519: 1.495ms
deleted @Gs2+i1SkbFf7n01TXx/EBm7ZikIUl4+HVSUiE35J124=.ed25519: 2.208ms
deleted @mmJb8JVp5FBXpN1gb7uqlK/SGjOePbDIkRnA17E9mtI=.ed25519: 2.678ms
deleted @AkJYJ7thmd9J5bitq6xU6z+gfcMTTiSzdQm7nK2dpCw=.ed25519: 5.215ms
deleted @ZgQ45llBVqIi7c9kRkGsnvZk4LFGqujaxNsMnHxzYQ4=.ed25519: 2.219ms
deleted @3eIPXptMcbrFPw8seKbywmPbRogERylNuoFVaZ9AlOg=.ed25519: 2.917ms
deleted @DBMTBDgZC88uPo5dlU8H4WlkV6my1QmWW+Lby4zaI1w=.ed25519: 1.347ms
deleted @LaQNM/hkNZUPByFIICKsj8RI/iOvdjYcbAQC1a5BwH0=.ed25519: 0.891ms
deleted @ALH8lrtDNvBcBVBTI7/k2F+dKihd+b05lkVZ+7pgZFE=.ed25519: 6.94ms
deleted @A3q42Z/EJ3k9HeS2gfDhvZQlNCZ5pzeufmg2MRjUgpo=.ed25519: 4.446ms
deleted @w5/1a7uynTiLQ5RuV9zTVbkCWh+E86eMrKGaq8PZO6E=.ed25519: 1.997ms
deleted @WxOYhw4CrR6z6yaVCkKxzuAVPtYhlia8aAKgqKmYq4s=.ed25519: 1.654ms
deleted @KpuOhIbGLD5M2F4lwtH6nAhxDoVO/2/zvZi4nJqvijE=.ed25519: 1.379ms
deleted @cZhK1oAt5onzEewN+gCzBxGwfrYBMlGUdquj8+4clss=.ed25519: 2.048ms
deleted @jLp9QbHLc7B4zOAo7ExoAvgQpzVvzS0i+jYkqQhpz50=.ed25519: 1.276ms
deleted @HxcxDdpj8TuTO9ybBzPxYG8/W1NYuqoiYxL/IbTPFdY=.ed25519: 3.134ms
deleted @ySnJg5KYBjio479vSgoA0rCXdHpB8s3mF9lwuD0YSnc=.ed25519: 2.389ms
deleted @JWXZfmQYxjxLGyP6PFafmHfN2aDb4ohmT8yXqbomvEM=.ed25519: 3.252ms
deleted @6/O/sZcUlXPwEuHJdDCjivsEHTdmPrXKQlR+aCQHHoo=.ed25519: 1.926ms
deleted @o1t/XgNYEYhR5JPpToh/BsB8XjJMMkAExMnnkUWqHJ4=.ed25519: 1.856ms
deleted @6SdZSnu63sHUgsRosjoDQv5k5u3Zsb/H5Fv3jS/m/ak=.ed25519: 1.605ms
deleted @/cL3q0vj/j410SM6e0SbOl+OiDSC2XpnI+1I5Vjs4n4=.ed25519: 1.398ms
deleted @xZneDl2i9H0zoHgkzMdulaB7x8yapto5UBx0tj1tlcM=.ed25519: 1.255ms
deleted @QuhhLQdwAj8DhJqj6QXtjSnGx19PojSYqlEXXyHTPw4=.ed25519: 1.36ms
deleted @YhV7oicUfjOPhmtpvHQH6kf3YObIq8kzdKhppP7Zw8U=.ed25519: 2.292ms
deleted @pV5I2yagSejH1Tgi1KO+xTqUiUflBnp9lFcCRZsSd9Y=.ed25519: 1.86ms
deleted @FHllWqdt2pdCe9lbAvHOV7xa9hd1fvN3/aWsuN70iqo=.ed25519: 1.58ms
deleted @yDB0lcMLrx5wwtYmd9OUnPWZfd+JGJRlNQUDhv214KA=.ed25519: 2.988ms
deleted @CZvqrkmQ/GEDxYAJjL/CukBrRHHoxI55E3Fbh1Z6PvM=.ed25519: 1.917ms
deleted @Z1t4s+OxTNd1FMk3D7iYvVcMQzJRJRMBAhge9crItO4=.ed25519: 1.355ms
indexing done at 1656622959452
  jitdb saving index: value_author +8s
  jitdb saving index: value_author_@TWKY4Bq5ezVqbXV2D7NyODxCXgu8o4rgpx2Fsf1GdHbCw=x2Eed25519 +0ms
  jitdb saving index: value_content_type_about +0ms
  jitdb saving index: value_content_about__map +0ms
  jitdb saving index: value_content_type_contact +0ms
  jitdb saving index: value_content_contact__map +1ms
  jitdb saving index: value_content_fork__map +0ms
  jitdb saving index: value_content_root_ +0ms
  jitdb saving index: value_content_root__map +1ms
  jitdb saving index: value_content_type_gathering +1ms
  jitdb saving index: value_content_type_post +0ms
  jitdb saving index: value_content_type_pub +0ms
  jitdb saving index: value_content_type_roomx2Falias +0ms
  jitdb saving index: value_content_type_vote +0ms
  jitdb saving index: value_content_vote_link__map +2ms
  jitdb saving index: meta_ +1ms
  jitdb saving index: meta_private_true +0ms
indexing at 0.0% at 1656622959461
compact: 822.79ms
  jitdb log.stream #413 started, updating indexes seq|value_author|value_author_@TWKY4Bq5ezVqbXV2D7NyODxCXgu8o4rgpx2Fsf1GdHbCw=x2Eed25519|value_content_type_about|value_content_about__map|value_content_type_contact|value_content_contact__map|value_content_fork__map|value_content_root_|value_content_root__map|value_content_type_gathering|value_content_type_post|value_content_type_pub|value_content_type_roomx2Falias|value_content_type_vote|value_content_vote_link__map|meta_|meta_private_true +2ms
  jitdb log.stream #413 ended, scanned 8491 records in 601ms +601ms
  jitdb saving core index: seq +0ms
  jitdb saving core index: timestamp +0ms
  jitdb saving core index: sequence +0ms
  jitdb saving index: value_author +11ms
  jitdb saving prefix index: value_author +0ms
  jitdb saving index: value_author_@TWKY4Bq5ezVqbXV2D7NyODxCXgu8o4rgpx2Fsf1GdHbCw=x2Eed25519 +1ms
  jitdb saving index: value_author_@TWKY4Bq5ezVqbXV2D7NyODxCXgu8o4rgpx2Fsf1GdHbCw=x2Eed25519 +0ms
  jitdb saving index: value_content_type_about +0ms
  jitdb saving index: value_content_type_about +0ms
  jitdb saving index: value_content_about__map +0ms
  jitdb saving prefix map index: value_content_about__map +0ms
  jitdb saving index: value_content_type_contact +0ms
  jitdb saving index: value_content_type_contact +0ms
  jitdb saving index: value_content_contact__map +0ms
  jitdb saving prefix map index: value_content_contact__map +0ms
  jitdb saving index: value_content_fork__map +0ms
  jitdb saving prefix map index: value_content_fork__map +0ms
  jitdb saving index: value_content_root_ +1ms
  jitdb saving index: value_content_root_ +0ms
  jitdb saving index: value_content_root__map +0ms
  jitdb saving prefix map index: value_content_root__map +0ms
  jitdb saving index: value_content_type_gathering +1ms
  jitdb saving index: value_content_type_post +0ms
  jitdb saving index: value_content_type_post +0ms
  jitdb saving index: value_content_type_pub +0ms
  jitdb saving index: value_content_type_roomx2Falias +0ms
  jitdb saving index: value_content_type_vote +0ms
  jitdb saving index: value_content_type_vote +0ms
  jitdb saving index: value_content_vote_link__map +0ms
  jitdb saving prefix map index: value_content_vote_link__map +0ms
  jitdb saving index: meta_ +2ms
  jitdb saving index: meta_ +0ms
  jitdb saving index: meta_private_true +0ms
  jitdb saving index: meta_private_true +0ms
indexing done at 1656622964783

André Staltz · Answer 33 · Mon Jul 04 2022 16:12:53 GMT+0800 (China Standard Time)

Incredibly good news from the jitdb refactor (to use only one log.stream). I ran the following on my production Manyverse desktop (@QlCT...). I deleted the jit folder and started up the app, letting it reindex all of them.

Before: 6m00s
After: 39s

That's 9x speed up! 🔥

Anders Rune Jensen · Answer 34 · Mon Jul 04 2022 16:25:23 GMT+0800 (China Standard Time)

Wow, that is really great news. Made my day to read this 🌞

André Staltz · Answer 35 · Mon Jul 04 2022 16:31:29 GMT+0800 (China Standard Time)

Just to make sure this good news is for real, I also deleted both indexes and jit and let it rebuild.

Before: 9m23s (this is consistent with what people have been telling me, that Manyverse takes "only" 10min to build indexes)
After: 2m42s

That's 563s / 162s = 3.5x speedup.