ssbc / ssb-db2

A new database for secure-scuttlebutt

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Database streaming glitches (+43 skipping)

staltz opened this issue · comments

Similar to #291, I'll report here my findings.

So far I know that the versions are:

  • Manyverse 0.2201.14-beta.b
    • Desktop (Linux) with a totally new identity, not migrated from flume
    • iOS some days after rebuilding indexes on a recovered feed
  • ssb-db2 2.8.5
  • jitdb 4.0.1
  • async-append-only-log 3.1.1
  • bipf 1.5.5

And the symptom is: open a thread and it will actually display a raw message for something completely unrelated, indicating a problem with the leveldb keys index.

Bad indexes

On cryptix's corrupted database, I found that (output from running study-keys-leveldb.js):

Loaded seq.index with:
  Entries: 1298604
  Offset: 972042648

Loaded keys leveldb with:
  Version: 1
  Entries: 1298560
  Offset: 972042648

minBrokenSeq = 291883
minBrokenOffset = 210340373


seq    | offset    | key in the keys leveldb index                        | key in the msg, if different
0      | 0         | %bQnYTsq0p+A8JppKPgze+ZyZxAOPVgMwleEXZKwOP7o=.sha256
1      | 528       | %7d2LLAa0hhSFwlpHjulTm9IFtin95UkLUzV43yajj4k=.sha256
2      | 994       | %ofF5MfiaCnHBqecGIDpojT8Hd15wbAuGzmhNxrJ6beo=.sha256
3      | 1476      | %OJVc0qShJtS1o0A9pO5LZp4OmwIhyUkUmz6FV2Nthz8=.sha256
...
291880 | 210338468 | %H3rlUZ1gJNmwEQ26nuEfWj0Gwt3jL8f5itcH3wGLuRU=.sha256
291881 | 210338937 | %9QtgDc30RPPVm9pR6CHebtXD98/Bqabj3Mpbl7GmTPM=.sha256
291882 | 210339437 | %ckZr3Hro61xRR9qgaQsmE03ZOmdXQoF4ajQzHhbRrTs=.sha256
291883 | 210340373 | %73n/YRJxKIZ8gcxPdoO2lmGTveUAx0twEBcE9T+C3zM=.sha256 | %9obExCOb1r6LGmYw9EEK/EUgBqZ2GgQOvvfAYbR+5x8=.sha256
291884 | 210341037 | %g1ibWtSya6mrByLtba4CaJKUm002gWEM0Ft18arr/Pw=.sha256 | %lzmZ8b2Ud6m1ueQDTBJTD5MYQcRXxNLprlHee3w/t30=.sha256
291885 | 210341506 | %f8upCrvoJGG4wh1IyuXYJWrYtcGKIJYxEBax01123KM=.sha256 | %06oaYt+kMHrCg19yQTXBGhj8xKtkDulOM9PtHkR1r2M=.sha256
291886 | 210341985 | %wD26cUoA0aOENE14EYGzAs0VrbVZkeehBjJSUdxdQ6Y=.sha256 | %t2kSIZvNC0kOJLLkIlv2ghPQ/dmP8YKEyagx5+EKHeg=.sha256
291887 | 210342454 | %nDjgJGy1zBfVkpjL+rh+MTv3cggAaAYMYthkxoBfzIw=.sha256 | %3nQDOgLUmjJjLqF9qf2nUqHdwqTG4AglMaCozfxIa+8=.sha256
291888 | 210342940 | %rhCJ7hvBVZvrB2rljEP0O1fqwhD/6MeMLmewumFyrOU=.sha256 | %sygYHLcTWrsjULHEZNyFObv5JdWIwhqvL1F0HuIrHGE=.sha256
291889 | 210343426 | %Gmvt4XX6UiKg2yWaPO7XhH0LOqiVBqgJKCiAv5lJ8L4=.sha256 | %iNR4VH0rg0hYxtmf9ykvPBmTgv3n/gEOOAbo0dDj2Wg=.sha256
291890 | 210344507 | %z0QulmPdr0HXv4HjVcrQJzRmsm+VROlfce03tUbkIzY=.sha256 | %XTJU+cEgkh253cCwwEr0XtsXOXuhD84cPsqLXI6NtA8=.sha256
291891 | 210345193 | %nWognOdQTi7Aq7iX/RCGoRshs9A8uOducNKksrc2Y/Y=.sha256 | %mBOWdDJWF9TPCKxnwOp558Pff4O+wFCclzQ6Kybvyf4=.sha256
291892 | 210345958 | %M8gjUX7Nb90+dC/YGBHmQLK4USbpDODqgmvjrYkEyfQ=.sha256 | %zflz+0VqxJc+4senR6khvbWWxCQHCTo2/ENEXZtxcRs=.sha256
291893 | 210346440 | %+NkyN4UsHhjTp14JSjn/H86rv8LmWhsAUWlXyTk+eRU=.sha256 | %OfnWuTln4iNO0O9BydKvN8Ia6NuYF/AiGp/qrjBDung=.sha256
291894 | 210347074 | %lUGBDXtFOYbTItDCJnGJ8991w0qGkLsmYjVm+MC1COw=.sha256 | %4ieSNeSaQaFdF6lNdGiBJKyPGvv++lNpykGy1ZnWhEE=.sha256
291895 | 210347556 | %VxwlHMu3RDeBXnZcsNM2pezfTw40tVZB4lG0ztjEcyM=.sha256 | %an0FlY1eMl6oxJ+RN5js72H48sZgt8RonGBShmWMvgQ=.sha256
291896 | 210348178 | %XDZ/QnfT7Go116LPjeSI8eyCE+154neqnItuQZ2v2Gg=.sha256 | %0sB0oYd4B/Mg4TLnXpVAcaibS+ks51s0QXSv68jAil0=.sha256
291897 | 210349063 | %LaNRu7+k0xiYiFnebZUdkSIW0GecRXAbtk5v2n4gLNo=.sha256 | %xJSoyAsXzs6yZv+J8PTdq0apB4RIutzyHBWQW14Ie00=.sha256
291898 | 210349542 | %nvqTUBG3bsY+spzkb1d+As3pZAlCekDg3hUXEy6ZniE=.sha256 | %wMWzSUlzjIbPaLXHwe6lofKf3MPZ4WQl3Y0dvjpCAEI=.sha256
291899 | 210350011 | %+/HEbtQQ4p1iytF6T0b4z+S5J64k1XwY+QFJblAwknc=.sha256 | %Fz4OFtxB4IoT9p/Jr3G07OdoV+gQAsWauFeOZw5S3oU=.sha256
291900 | 210350804 | %ZgwsxurKlxxlQS1ErnMx3SSFsQ97v2H+ystqLZ6oi9w=.sha256 | %TReB1TEODst195Pim8QXnHspeNz48VbFMJ3F8lU+Vmw=.sha256
291901 | 210352397 | %rfmIR7moR6KDhVX4nrtvVdkWpjUi0rBMsHTuPwy8eEM=.sha256 | %G3LUh5iRYR0FWwvVcJ6/BjO4R2xl3BD+IRP5qLXsEC0=.sha256
291902 | 210353814 | %+f2OdKTw2CU2QC03M1FgaAwpv/ft0J2RfHg1lyO+Nu0=.sha256 | %xhxxBLB93Vlv+FElaGQJpceLS/C8lpo2H6AnONjJXsU=.sha256
291903 | 210354283 | %SdKcgPfF9wZGeh+eL9OQ5vDDQ78MnOEndr/9Rxee0OM=.sha256 | %3eL+Nj5gKOIUk43h4p0cy0kkCG4YrQU3Uuj2/zbZnas=.sha256
291904 | 210354762 | %DCST6XAIho0dSQNNcLn5A93T6MFTiZOYLErokXbdDxY=.sha256 | %+zlfeg7YxCHYm8knTAiaqGkKoutvEl48c1uw0BuNEt4=.sha256
291905 | 210355246 | %I5LVdeuh87YFHRuAWoputVmRCKT4UJ3/QJTGTAsuxE8=.sha256 | %FffPkwTM8LM0Dw+c7hYopmlpnM9BDevPZMJxYKXmB94=.sha256
291906 | 210355715 | %gtDvCKOeM4oMxwZ6Ru8y2fvBB3q5QsybSp24gNrv0Do=.sha256 | %2EEfV/N9F4HiaU6ayQTtbXOJPj/NOiiY4WrpZIADBnU=.sha256
291907 | 210356429 | %tB5RWn3oUNCT77OON6UsZMPks7cgb0l4BK9RzsKerDM=.sha256 | %YPVFYAiw73l/W5BjcMbCfl6/6a5/ZHx28XCY8Q1AKMo=.sha256
291908 | 210357248 | %tC9dsF/HDk04ce1G488WbcCv+n0ztXGRqaIks9SBGzY=.sha256 | %b9XDAvVSCncoQUJ8WqT86JZbPRwYrzCPfny46Ng00HA=.sha256
291909 | 210357898 | %tAm8IA8ENzzcsF/nNIjKrAzsm80sQibplboIF3slPg8=.sha256 | %K9NuhMnx0qpcj9LMZEgVcIYwF6OUkFRAaON4dkp8UGs=.sha256
291910 | 210358747 | %ELToi1XCos9TiU2cZZiKcb4k60oW1DZ8JsP7mFcozAo=.sha256 | %44y0rl26zKewFNs5ToJvLSqTLJksDo554pI+dj37dUU=.sha256
291911 | 210359597 | %jk6JXdfxASl2J2Ww9hNmFoM8bc4fooFsl10Pflckpuo=.sha256 | %+WcmGWdzYXT0D8ztoyCJIKWX13vQWYV3VGQiAY9wfz0=.sha256
291912 | 210360066 | %cbKjlGrvnL0xMsuHCnbADCBWJdykuC/Bx8gNZcsYAEc=.sha256 | %E3hRWw4BDNYUIW4xruieG95P870bLf/mNzZPWqEIPd0=.sha256
291913 | 210360552 | %ABHow9xuguSBlrMY2n+BTKxYhmzfOOz2savAY16EbvQ=.sha256 | %HFNI1IGZTfvY4OK1BPgwDujirJnqE/rRBm97KY3r/sw=.sha256
291914 | 210361021 | %X86fhS/zmq1DLs2MQxdMMxs6ANKY5Kw3/lJHu4ACb0E=.sha256 | %fAZumYqEygz2pmmfkN6U+4Y4Fbdp2uiBygnzDMIlQrs=.sha256
291915 | 210361822 | %+ix7lmQDywd0IIc9EXRZqC/WGt4gHQkgskgqsEg3ZPA=.sha256 | %pALR7zZxVTz5VshMPAY8qKKKXlrPWG3zx53LsIpU4mE=.sha256
291916 | 210362306 | %jfSOlvgaKM3MJITi5v6a3fM0N2hM0c9Hth9HiMnG4F8=.sha256 | %iwttItYqOHOTR9Pm6sEot6t3itD84fd/x57oGsr+pYg=.sha256
291917 | 210362819 | %NoDOpqsw4ZOAoYWjw5IfpqiGVqpQeIHuZAHpou2gkPw=.sha256 | %SILpLth6YrOgiht1M4OCQnn/ecP3GSVToJ64RnvdzGA=.sha256
291918 | 210363288 | %nkU0p8s2VovvVTp2ZkmtPQsCM8CD66IBej18SFgJLis=.sha256 | %54a8BgUcVIXC8irnnfoyfs/gjRzAb4aK3CZNx10/0aE=.sha256
291919 | 210363824 | %s4fsyTUbkuEmf6+k9/VYe+GMQ40RxeMfqjzi9W6giao=.sha256 | %gvJqVRDR0oXyh7XxASdNQ+gyZqcfenxAp7sRtKUw0sM=.sha256
291920 | 210365045 | %gwQSR1SwTP89Nco58udjCEPudVJJ42WR05600ux27qI=.sha256 | %P4CS86z7+zLPzo46AaEsAjihgXJrqpZVfvB/ak9gFJo=.sha256
291921 | 210366008 | %iueNHOcBJtM3BLGTDNXKbTk8fKlBZT/eQ4B6Um08fjU=.sha256 | %FoWyy7a+txyIxXhft/cQuQrBRx2NfP2e6bTSLMnec1o=.sha256
291922 | 210366477 | %+21ht4jnkhraez6PfO8ubGjkhZdJvyqDhBI+NL9tHas=.sha256 | %DasqxWho5oZBrTaK7KXMfwVoax3l9XvUIPig1kJ2iyg=.sha256
291923 | 210366961 | %q4QmZqMB9dwQuKUL50X2d8J6Ra0gYu3/hIC24Y80kAw=.sha256 | %87BWvZaNTSMDBzmmcjdRqaghGdKF1UFPxehfd1yDU4A=.sha256
291924 | 210368420 | %wu7b+3dKqmDn3FBQawuH/BxgbLXE/2VGqROb/gyp3X4=.sha256 | %Sk8AX9tfhWem5lZCovRC3lH8p8t6V219wwWo95HNYBs=.sha256
291925 | 210369206 | %PzWxPpkHCvC5hfT88b2JBKKdpEYvV/kLIxOORj+enDI=.sha256 | %eV2SPiGSullFc7ICxw5Z9TIDFtuXLj1u8j+FaaTwKjU=.sha256
291926 | 210402882 | %v+Ofjp/19erSJHp2774/3NfCQm1d7ZwlKchAUHhI/y8=.sha256
291927 | 210403364 | %6iOzBBP5IBvJRRlV0FrhEk+WRDJ/7ps2UA/0z8We4yg=.sha256
291928 | 210403971 | %bpqwIACzjxcPSHpETXWxKTJGXS3xhb1Ll7Li6RvUT6s=.sha256
291929 | 210404455 | %wC0bXLVVTH3K7B2FulKCS+N0732KBB8bkUY4/53dG0M=.sha256
291930 | 210405840 | %3mUEUYzdbPPJVh/G1LtNIqAgZNzxzsctiBGDXDQHylY=.sha256
291931 | 210406309 | %IR0F92xk9Jeo205t27ZTbvgJVMPj/b6ehEndzn/iVsA=.sha256
291932 | 210407525 | %pcMMxB08L3KfAmBtJQuEh6v7VVRm+YTGjesJOL3shmI=.sha256
291933 | 210407994 | %VFgOPJ9HgesqDJce7SLTCVONzzcBql7jUIkpjDzf9UY=.sha256
...
528813 | 408333839 | %mWiDwoQmcEQnQG2WOf7NXaoBIJuNnKf2sLOxDbYBH4Y=.sha256
528814 | 408334308 | %3F391lgHhw88sfeL6wtbq6TgMWasgXDmCrUSdp/zd44=.sha256
528815 | 408334821 | %pCVNsF7eHT+u7a0mPWq/BGkg0sfLU5LwofO0HeHa1ek=.sha256
528816 | 408335307 | %Zql6qOjnQ8XnPU/i+lfaxM7fUYs7kiKU6JDcb/RC4cg=.sha256
528817 | 408335776 | %SCGAi04KD9xRqzJriQQZbEVtkffoeUQony7nqxK3aTM=.sha256
528818 | 408314390 | %3bvrXN9ohKMHbPWIg46omM4BNAnQaG+K7V6FS+cDXfo=.sha256 | %xrprVOCBW/ESJ3nuq718uqV4EyCUI56kKer1OJqqY70=.sha256
528819 | 408314887 | %n8NnVOnWeQc6l3fS2IzKq7TUFSF0l1abuNvRCwRD+bk=.sha256 | %G8jqzh8B/7XmvcPdfJv3AMdsmE8JTOvD1rC42khcRk0=.sha256
528820 | 408315356 | %SZxRIa7kPS2WDBJjNdOJvDO5KKcLoMcz/8Kj8hZRsB8=.sha256 | %4XffH5DRze18R96h9iXCQV4FHs6eSJVetg4ikvK2pu8=.sha256
528821 | 408316073 | %FFLhCVVnIgGVzYAFl0n90nQAyEDl7E97GAIfOLnv6Lo=.sha256 | %YjvGMiET9Oz7HBHd+iIF9mTAeaNj7vU35H1xTfBzpzs=.sha256
528822 | 408316500 | %ZxvqQmbwNEsy5rcxEYJepZmrG/VqZKNuh4TjNp/5G78=.sha256 | %tVcDsUNbgN2GaQ1ATk+g6sdDrGKkqML5dymvsvGNoyI=.sha256
528823 | 408317004 | %5HnELm+iHRW62FIs3E6ExpUxjnxBnRC1OX6dxBZsN5c=.sha256 | %993dhWwBIM1TvU2IPqRnbY1I5VzRlEJ8GAWMNch5uOY=.sha256
528824 | 408317510 | %M2VXkTy4qbngb/PyV0s7wUKtlRlxscaU81BWETeQm9U=.sha256 | %JEckdlPPuEmMKE9TUcmHFo49ZzuluKncRACnDgebHkk=.sha256

Running rebuild-indexes.js gave me this (after some 2 minutes, and having built at least some index files), notice something about random-access-storage:

Click to reveal
Rebuilding indexes...

<--- Last few GCs --->

[1049540:0x4442d40]   227539 ms: Scavenge 1830.7 (1918.0) -> 1823.0 (1919.7) MB, 13.4 / 0.0 ms  (average mu = 0.255, current mu = 0.246) allocation failure 
[1049540:0x4442d40]   227685 ms: Scavenge 1832.5 (1919.7) -> 1826.7 (1925.5) MB, 28.6 / 0.0 ms  (average mu = 0.255, current mu = 0.246) allocation failure 
[1049540:0x4442d40]   228036 ms: Scavenge 1838.5 (1925.5) -> 1831.6 (1908.2) MB, 36.6 / 0.0 ms  (average mu = 0.255, current mu = 0.246) allocation failure 


<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 0x1409219]
Security context: 0x3b8d1a3c08d1 <JSObject>
    1: hidden(aka hidden) [0x16d9337cb989] [internal/errors.js:~282] [pc=0x2acf5a56f677](this=0x1ad5b2dc04b1 <undefined>)
    2: arguments adaptor frame: 3->0
    3: _run [0x2f9dc10e7379] [./node_modules/.pnpm/random-access-storage@1.4.2/node_modules/random-access-storage/index.js:~196] [pc=0x2acf5a4eadb4](this=0x2451ac631249 <Request map ...

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0xa17c40 node::Abort() [node]
 2: 0xa1804c node::OnFatalError(char const*, char const*) [node]
 3: 0xb95a7e v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 4: 0xb95df9 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 5: 0xd53075  [node]
 6: 0xd53706 v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [node]
 7: 0xd5ffc5 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [node]
 8: 0xd60e75 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
 9: 0xd6251f v8::internal::Heap::HandleGCRequest() [node]
10: 0xd10f85 v8::internal::StackGuard::HandleInterrupts() [node]
11: 0x106c5c6 v8::internal::Runtime_StackGuard(int, unsigned long*, v8::internal::Isolate*) [node]
12: 0x1409219  [node]
[1]    1049540 abort      node rebuild-indexes.js

Good indexes

After running it again and successfully rebuilding indexes, I ran study-keys-leveldb.js again and got:

Loaded seq.index with:
  Entries: 1298604
  Offset: 972042648

Loaded keys leveldb with:
  Version: 1
  Entries: 1298604
  Offset: 972042648


seq    | offset    | key in the keys leveldb index                        | key in the msg, if different
0      | 0         | %bQnYTsq0p+A8JppKPgze+ZyZxAOPVgMwleEXZKwOP7o=.sha256
1      | 528       | %7d2LLAa0hhSFwlpHjulTm9IFtin95UkLUzV43yajj4k=.sha256
2      | 994       | %ofF5MfiaCnHBqecGIDpojT8Hd15wbAuGzmhNxrJ6beo=.sha256
3      | 1476      | %OJVc0qShJtS1o0A9pO5LZp4OmwIhyUkUmz6FV2Nthz8=.sha256
...
291880 | 210338468 | %H3rlUZ1gJNmwEQ26nuEfWj0Gwt3jL8f5itcH3wGLuRU=.sha256
291881 | 210338937 | %9QtgDc30RPPVm9pR6CHebtXD98/Bqabj3Mpbl7GmTPM=.sha256
291882 | 210339437 | %ckZr3Hro61xRR9qgaQsmE03ZOmdXQoF4ajQzHhbRrTs=.sha256
291883 | 210340373 | %9obExCOb1r6LGmYw9EEK/EUgBqZ2GgQOvvfAYbR+5x8=.sha256
291884 | 210341037 | %lzmZ8b2Ud6m1ueQDTBJTD5MYQcRXxNLprlHee3w/t30=.sha256
291885 | 210341506 | %06oaYt+kMHrCg19yQTXBGhj8xKtkDulOM9PtHkR1r2M=.sha256
291886 | 210341985 | %t2kSIZvNC0kOJLLkIlv2ghPQ/dmP8YKEyagx5+EKHeg=.sha256
291887 | 210342454 | %3nQDOgLUmjJjLqF9qf2nUqHdwqTG4AglMaCozfxIa+8=.sha256
291888 | 210342940 | %sygYHLcTWrsjULHEZNyFObv5JdWIwhqvL1F0HuIrHGE=.sha256
291889 | 210343426 | %iNR4VH0rg0hYxtmf9ykvPBmTgv3n/gEOOAbo0dDj2Wg=.sha256
291890 | 210344507 | %XTJU+cEgkh253cCwwEr0XtsXOXuhD84cPsqLXI6NtA8=.sha256
291891 | 210345193 | %mBOWdDJWF9TPCKxnwOp558Pff4O+wFCclzQ6Kybvyf4=.sha256
291892 | 210345958 | %zflz+0VqxJc+4senR6khvbWWxCQHCTo2/ENEXZtxcRs=.sha256
291893 | 210346440 | %OfnWuTln4iNO0O9BydKvN8Ia6NuYF/AiGp/qrjBDung=.sha256
291894 | 210347074 | %4ieSNeSaQaFdF6lNdGiBJKyPGvv++lNpykGy1ZnWhEE=.sha256
291895 | 210347556 | %an0FlY1eMl6oxJ+RN5js72H48sZgt8RonGBShmWMvgQ=.sha256
291896 | 210348178 | %0sB0oYd4B/Mg4TLnXpVAcaibS+ks51s0QXSv68jAil0=.sha256
291897 | 210349063 | %xJSoyAsXzs6yZv+J8PTdq0apB4RIutzyHBWQW14Ie00=.sha256
291898 | 210349542 | %wMWzSUlzjIbPaLXHwe6lofKf3MPZ4WQl3Y0dvjpCAEI=.sha256
291899 | 210350011 | %Fz4OFtxB4IoT9p/Jr3G07OdoV+gQAsWauFeOZw5S3oU=.sha256
291900 | 210350804 | %TReB1TEODst195Pim8QXnHspeNz48VbFMJ3F8lU+Vmw=.sha256
291901 | 210352397 | %G3LUh5iRYR0FWwvVcJ6/BjO4R2xl3BD+IRP5qLXsEC0=.sha256
291902 | 210353814 | %xhxxBLB93Vlv+FElaGQJpceLS/C8lpo2H6AnONjJXsU=.sha256
291903 | 210354283 | %3eL+Nj5gKOIUk43h4p0cy0kkCG4YrQU3Uuj2/zbZnas=.sha256
291904 | 210354762 | %+zlfeg7YxCHYm8knTAiaqGkKoutvEl48c1uw0BuNEt4=.sha256
291905 | 210355246 | %FffPkwTM8LM0Dw+c7hYopmlpnM9BDevPZMJxYKXmB94=.sha256
291906 | 210355715 | %2EEfV/N9F4HiaU6ayQTtbXOJPj/NOiiY4WrpZIADBnU=.sha256
291907 | 210356429 | %YPVFYAiw73l/W5BjcMbCfl6/6a5/ZHx28XCY8Q1AKMo=.sha256
291908 | 210357248 | %b9XDAvVSCncoQUJ8WqT86JZbPRwYrzCPfny46Ng00HA=.sha256
291909 | 210357898 | %K9NuhMnx0qpcj9LMZEgVcIYwF6OUkFRAaON4dkp8UGs=.sha256
291910 | 210358747 | %44y0rl26zKewFNs5ToJvLSqTLJksDo554pI+dj37dUU=.sha256
291911 | 210359597 | %+WcmGWdzYXT0D8ztoyCJIKWX13vQWYV3VGQiAY9wfz0=.sha256
291912 | 210360066 | %E3hRWw4BDNYUIW4xruieG95P870bLf/mNzZPWqEIPd0=.sha256
291913 | 210360552 | %HFNI1IGZTfvY4OK1BPgwDujirJnqE/rRBm97KY3r/sw=.sha256
291914 | 210361021 | %fAZumYqEygz2pmmfkN6U+4Y4Fbdp2uiBygnzDMIlQrs=.sha256
291915 | 210361822 | %pALR7zZxVTz5VshMPAY8qKKKXlrPWG3zx53LsIpU4mE=.sha256
291916 | 210362306 | %iwttItYqOHOTR9Pm6sEot6t3itD84fd/x57oGsr+pYg=.sha256
291917 | 210362819 | %SILpLth6YrOgiht1M4OCQnn/ecP3GSVToJ64RnvdzGA=.sha256
291918 | 210363288 | %54a8BgUcVIXC8irnnfoyfs/gjRzAb4aK3CZNx10/0aE=.sha256
291919 | 210363824 | %gvJqVRDR0oXyh7XxASdNQ+gyZqcfenxAp7sRtKUw0sM=.sha256
291920 | 210365045 | %P4CS86z7+zLPzo46AaEsAjihgXJrqpZVfvB/ak9gFJo=.sha256
291921 | 210366008 | %FoWyy7a+txyIxXhft/cQuQrBRx2NfP2e6bTSLMnec1o=.sha256
291922 | 210366477 | %DasqxWho5oZBrTaK7KXMfwVoax3l9XvUIPig1kJ2iyg=.sha256
291923 | 210366961 | %87BWvZaNTSMDBzmmcjdRqaghGdKF1UFPxehfd1yDU4A=.sha256
291924 | 210368420 | %Sk8AX9tfhWem5lZCovRC3lH8p8t6V219wwWo95HNYBs=.sha256
291925 | 210369206 | %eV2SPiGSullFc7ICxw5Z9TIDFtuXLj1u8j+FaaTwKjU=.sha256
291926 | 210370560 | %73n/YRJxKIZ8gcxPdoO2lmGTveUAx0twEBcE9T+C3zM=.sha256
291927 | 210371753 | %g1ibWtSya6mrByLtba4CaJKUm002gWEM0Ft18arr/Pw=.sha256
291928 | 210372510 | %f8upCrvoJGG4wh1IyuXYJWrYtcGKIJYxEBax01123KM=.sha256
291929 | 210373074 | %wD26cUoA0aOENE14EYGzAs0VrbVZkeehBjJSUdxdQ6Y=.sha256
291930 | 210373558 | %nDjgJGy1zBfVkpjL+rh+MTv3cggAaAYMYthkxoBfzIw=.sha256
291931 | 210374042 | %rhCJ7hvBVZvrB2rljEP0O1fqwhD/6MeMLmewumFyrOU=.sha256
291932 | 210374511 | %Gmvt4XX6UiKg2yWaPO7XhH0LOqiVBqgJKCiAv5lJ8L4=.sha256
291933 | 210374997 | %z0QulmPdr0HXv4HjVcrQJzRmsm+VROlfce03tUbkIzY=.sha256
...
528813 | 408311900 | %FqoCIY0kq7W/B25zSuf45L0CuMnDQ917TuB8ihf0Q90=.sha256
528814 | 408312449 | %GuYLjx3BXaNy/VR90NMLAr/DaFKLsEVlXZ49avLH+EY=.sha256
528815 | 408312918 | %9lpZOUKO0N/Rn6NH9p4NeeRMTsMRdR6L//i910ioc7w=.sha256
528816 | 408313424 | %9QWHpS8rUUHeSUPGD8jrMBlr178lGJXp9QmA6Tr85OM=.sha256
528817 | 408313921 | %mwTDF+XTiWllwenZEfV4vzL4GtrGzw5evG/sqybMMHo=.sha256
528818 | 408314390 | %xrprVOCBW/ESJ3nuq718uqV4EyCUI56kKer1OJqqY70=.sha256
528819 | 408314887 | %G8jqzh8B/7XmvcPdfJv3AMdsmE8JTOvD1rC42khcRk0=.sha256
528820 | 408315356 | %4XffH5DRze18R96h9iXCQV4FHs6eSJVetg4ikvK2pu8=.sha256
528821 | 408316073 | %YjvGMiET9Oz7HBHd+iIF9mTAeaNj7vU35H1xTfBzpzs=.sha256
528822 | 408316500 | %tVcDsUNbgN2GaQ1ATk+g6sdDrGKkqML5dymvsvGNoyI=.sha256
528823 | 408317004 | %993dhWwBIM1TvU2IPqRnbY1I5VzRlEJ8GAWMNch5uOY=.sha256
528824 | 408317510 | %JEckdlPPuEmMKE9TUcmHFo49ZzuluKncRACnDgebHkk=.sha256

Analysis

I ran node log-check-dups.js and the log does not have duplicate records.

Note that the true number of records is 1298604 while the bad keys leveldb reported 1298560 entries. The difference 1298604 - 1298560 is 44.

In the bad indexes, there are two sections where the keys are wrong:

  • 1st bad section: seq 291883 to seq 291925
  • 2nd bad section: seq 528818 onward til the end of log

1st bad section

In the 1st bad section, seq 291883 should have the key %9obExCO... but has the key %73n/YRJ... which actually belongs to seq 291926. Notice that 291926 is just 291925+1, in other words, the ending seq of the 1st bad section. In other words, in the 1st bad section, the keys are appearing 43 slots earlier than they should (i.e. 291926 - 219883).

After the 1st bad section, we see the return of "good" entries, but look carefully:

In the bad outputs:

291926 | 210402882 | %v+Ofjp/19erSJHp2774/3NfCQm1d7ZwlKchAUHhI/y8=.sha256

In the good output:

291926 | 210370560 | %73n/YRJxKIZ8gcxPdoO2lmGTveUAx0twEBcE9T+C3zM=.sha256

They differ by byte offset! The key %v+Ofjp... does not belong to seq 291926 but it does belong correctly to the offset 210402882. See also this entry in the good output:

291969 | 210402882 | %v+Ofjp/19erSJHp2774/3NfCQm1d7ZwlKchAUHhI/y8=.sha256

And note that 291969 - 291926 is 43.

2nd bad section

The bad outputs says:

528775 | 408314390 | %xrprVOCBW/ESJ3nuq718uqV4EyCUI56kKer1OJqqY70=.sha256
...
528817 | 408335776 | %SCGAi04KD9xRqzJriQQZbEVtkffoeUQony7nqxK3aTM=.sha256
528818 | 408314390 | %3bvrXN9ohKMHbPWIg46omM4BNAnQaG+K7V6FS+cDXfo=.sha256 | %xrprVOCBW/ESJ3nuq718uqV4EyCUI56kKer1OJqqY70=.sha256

Notice that the offset 408314390 has indeed the msg key %xrprVOC... but this happens twice for two different seq: 528775 and 528818. The difference between these two numbers is 43. All other entries from that point onwards behave similarly.

Missing seq

I noticed this in both good and bad outputs:

9425 | 7302146 | %oIiq7gje/+QXbhfRiaXx+bhzGZPd3OM+Nhnd9eCSguM=.sha256
9426 | 7303117 | %dPdkR/J5T8JF0lXQSsMjw7If3PRNCdQBKPT7X0lLcYQ=.sha256
9428 | 7304466 | %Wkk0nOw7EnbXrzYbBkKmHVi+mwdrwVVdXHUTfPVF92o=.sha256
9429 | 7305483 | %xBDBnX7B8DgtSmGBtpgU4xsgfSkUKYmotR0mAWKCS3E=.sha256

Where is seq 9427 ?!

It is the first such occurrence, but there are other cases where it skips, like this one in the good output only

52611 | 40873879 | %aTW7VtlC+M9di0vacKtglfqXWOwtaMNvGe4A2HMPzxA=.sha256
52612 | 40874484 | %yEYFpQRDA5v0Ofbh0Qm3jgfwWvnUS4KU69oBq2pKTp4=.sha256
52614 | 40877530 | %2JJI6fAc8c5CQAUWgVDH4fBUAP7rjhVgD3IjjdIshhs=.sha256
52615 | 40878158 | %D5FMpURunYhfJeIYM41VWSpnQETuHKmK09pfdK/agRM=.sha256

(where is 52613?) While the bad output would say:

52611 | 40873879 | %aTW7VtlC+M9di0vacKtglfqXWOwtaMNvGe4A2HMPzxA=.sha256
52612 | 40874484 | %yEYFpQRDA5v0Ofbh0Qm3jgfwWvnUS4KU69oBq2pKTp4=.sha256
52613 | 40875796 | %zziGkJTh/9bDHSe6wztHNzfcaFdtKRH0KYCbhfg1b78=.sha256
52614 | 40877530 | %2JJI6fAc8c5CQAUWgVDH4fBUAP7rjhVgD3IjjdIshhs=.sha256
52615 | 40878158 | %D5FMpURunYhfJeIYM41VWSpnQETuHKmK09pfdK/agRM=.sha256

I took a look at the hexadecimal in the log, and at offset 7303117 (seq 9426) we indeed have the message with key %dPdkR/J5T... and the next message in the log is at offset 7303983 and has the key %zzykbaDY.... After that, the next message in the log is at offset 7304466 (as expected) and has the key %Wkk0nOw... (as expected).

Why was this record skipped in both "good" and "bad" runs of the study-keys-leveldb.js script? Could it be that there is a bug in ssb-db2/jitdb running in study-keys-leveldb.js which causes even the "good" output to have mistakes?

Note: the scripts were using ssb-db2 2.8.3 but I just updated all the deps to the latest versions (ssb-db2 2.8.5 etc) and rebuilt the indexes and I still have the missing "seq 9427" situation.

What's funny is that I put a console log to see if seq 9427 was being processed and whether the keys index was putting in batch, and yes it was correctly processed. But when running study-keys-leveldb, it doesn't show up. Could it be related to persisting to disk?

The missing 9427 was a bug in study-keys-leveldb.js, I fixed it like this: staltz/ssb-db2-issue-291@b5ba60a

(There was a reason why the corresponding msg key was %zzyk... it's the "last key" lexicographically sorted)

Gladly, this hasn't changed the "Analysis" I wrote here, there still is a gap of "43". Sorry for the detour.

For my own sake, some notes:

The bad keys leveldb has (key => value):

%73n/YRJxKIZ8gcxPdoO2lmGTveUAx0twEBcE9T+C3zM=.sha256 => 291883
%v+Ofjp/19erSJHp2774/3NfCQm1d7ZwlKchAUHhI/y8=.sha256 => 291926

The good keys leveldb has:

%73n/YRJxKIZ8gcxPdoO2lmGTveUAx0twEBcE9T+C3zM=.sha256 => 291926
%v+Ofjp/19erSJHp2774/3NfCQm1d7ZwlKchAUHhI/y8=.sha256 => 291969

The bad seq.index has:

291883 => 210340373
291926 => 210402882

The good seq.index has:

291883 => 210340373
291926 => 210370560

And the log has:

210340373 => %9obExCOb1r6LGmYw9EEK/EUgBqZ2GgQOvvfAYbR+5x8=.sha256

Thus in the bad case:

%73n/YRJ... ==BAD==> seq 291883 ==good=> offset 210340373 ==good=> %9obExCO...
           \__keys__/          \__seq.i_/                \__log___/

%v+Ofjp/... ==BAD==> seq 291926 ==BAD==> offset 210402882 ==good=> %v+Ofjp/...

While the good case has:

%9obExCO... ==good=> seq 291883 ==good=> offset 210340373 ==good=> %9obExCO...
           \__keys__/          \__seq.i_/                \__log___/

%v+Ofjp/... ==good=> seq 291969 ==good=> offset 210402882 ==good=> %v+Ofjp/... 

@arj03 Is the culprit this???

   this.level.get(META, { valueEncoding: 'json' }, (err, status) => {
      debug(`got index status:`, status)

      if (status && status.version === version) {
        processedSeq = status.processed
        processedOffset = status.offset
        this.offset.set(status.offset)
        if (this.onLoaded) {
          this.onLoaded(() => {
            this._stateLoaded.resolve()
          })
        } else {
          this._stateLoaded.resolve()
        }
      } else {
        this.level.clear(() => {
          processedOffset = -1
          //
          // WE SHOULD HAVE `processedSeq = 0` HERE SHOULDN'T WE????????
          //
          this.offset.set(-1)
          this._stateLoaded.resolve()
        })
      }
    })

At https://github.com/ssb-ngi-pointer/ssb-db2/blob/7f23695f8bd59fc8f3ee3f476e1b0f5da69ec6f9/indexes/plugin.js#L110

That does look sloppy. But the index should be loaded before doing anything and the default value of processedSeq should be 0.

Yeah, it looks weird because we have the defaults. But why do we have processedOffset = -1 too? (test suite passes if I comment it out)

It can be safely deleted I think. I thought there was a test for this version upgrade, but now I can't find it?

Wow that number 43 keeps popping up. Great debugging. I can confirm seeing the same for seq.index. It is very interesting to see the same kind of bug in both level and jitdb indexes. 🤔

Some more observations to confirms that it seems the problem must be in either AAOL or push-stream:

The missing messages are part of block 3209, but the message previous to the missing is also in block 3209, this means that the 43 is not a whole block:

// offset 210340373 block 3209 level first error
// offset 210370560 block 3210 seq the good offset
// offset 210402882 block 3210 seq first error, also level ok

// offset 210339437 block 3209 last level ok
// offset 210369206 block 3209 last level err

The EBT plugin is also missing the same messages as the keys leve index. Example key %lzmZ8b2Ud6m1ueQDTBJTD5MYQcRXxNLprlHee3w/t30=.sha256 is ["@R/VYSOMrYsL03NYYvnh5lKbogsbRncWn9eT2Dm9tlno=.ed25519",2]

It could be that either the 43 values are skipped, or they are corrupted.

I wonder if that 43 has something to do with too-hot. That one would mean things come in lumps of a certain size.

If you're online now, a video call works for me. But no pressure if now is not a good time.

One thing I found just by reading code (I don't know if this would fix the corruption):

https://github.com/ssb-ngi-pointer/async-append-only-log/blob/abbd1887d3ef414d1c1f67f49c70244c50298ebb/stream.js#L134

Should it have been

-  this.blocks.getBlock(this.cursor, this._resumeCallback)
+  this.blocks.getBlock(this.cursor, this._resumeCallback.bind(this))

?

UPDATE: Nevermind, it's bound already here: https://github.com/ssb-ngi-pointer/async-append-only-log/blob/abbd1887d3ef414d1c1f67f49c70244c50298ebb/stream.js#L28

Yep :) It's a bit confusing the way it is written.

Thought 1:

It could be that we should use sink.paused = true here instead of s.sink.paused = true: https://github.com/ssb-ngi-pointer/ssb-db2/blob/89838bcd46cbc8b81b9d487e6abe642a755790a9/log.js#L87

Thought 2:

It might be that the stream skips 43 (or whatever amount) only when streaming with live: true and it might be that in your stress tests they were live: false.

Hmm, interesting. Could be good to test thought 1 and see how it behaves.

Thought 2: It should be here. But yes it could be related to live. I'll push up a branch with my extra stress test in ~30 mins. I'm just wondering if one of the machines was a migrate, then all of that data should have been migrated before we get to the live part, right? And the error appears quite early.

I'm just wondering if one of the machines was a migrate, then all of that data should have been migrated before we get to the live part, right?

At least for cryptix, whose is the dataset we're studying, it was a new feed replicating from friends, so no migration.

I pushed up the branch.

I tried setting s.paused = true instead of s.sink.paused = true. This is clearly wrong, but it was interesting to see that stress test then has a lot of errors. I saw both: missing messages and the dreaded offset is out of range in bipf.

I tried setting s.paused = true instead of s.sink.paused = true.

Oh, by the way, previously I said it wrong. I meant to replace s.sink.paused with o.paused. Maybe we should rename those one letter variables to make it more obvious.

Here's how my stress tests looks like: https://github.com/staltz/ssb-db2-issue-291/blob/main/log-stream-stress.js

I ran it on repeat (in --verify mode) for 20mins (it probably executed more than 20 times) and didn't bump into any problem. I also tried replacing s.sink.paused with o.paused and no problems either.

Yes that would be good. Check ssbc/async-append-only-log#39

Thought 3:

It could be that this else branch is calling (incorrectly) originalWrite while the sink is actually paused: https://github.com/ssb-ngi-pointer/ssb-db2/blob/89838bcd46cbc8b81b9d487e6abe642a755790a9/log.js#L95-L96

Thought 4:

Is Stream.prototype.resume (in AAOL) idempotent? I.e. if there happens to be some code that calls resume() while there is an actively ongoing "resume" delivering values to the sink, will we see bugs or not?

Note: this is about multiple concurrent resume() calls on the same log.stream() instance, not multiple resume() calls on different log.stream() instances which is the case of ssb-db2's log.stream for leveldb happening concurrently with JITDB's log.stream

🎉 I CAN REPRODUCE IT

Indeed Thought 4 was correct. I pushed up a reproduction script at log-stream-stress.js.

Run it once as node log-stream-stress.js and then after it completes, it'll create the file keys.txt and then you can run node log-stream-stress.js --verify a couple times. It frequently (but not deterministically, because of the setTimeouts) crashes when I do that.

I have a candidate solution for AAOL stream.js but it's causing some other problems. I'll work on it.

Wow! Fantastic :-)

Fixed this issue in async-append-only-log ssbc/async-append-only-log#40