orbitdb / orbitdb

Peer-to-Peer Databases for the Decentralized Web

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OrbitDB can't be opened - AggregateError: All promises were rejected

silkroadnomad opened this issue · comments

From time to time in my Chrome browser (and also in an Opera instance) after deleting a record from a document type db, records of the db can't be read anymore. Reloading the browser doesn't help, no records can't be read at all.

An excecption is thrown, after db.all()
"AggregateError: All promises were rejected "

@orbitdb/core 2.1.0
helia 4.1.0

Current Workaround:

  1. drop db when no peers are connected
  2. reload browser, so browser reconnects to other peers
  3. other browser peers with same db are replicating back (restore) - but broken record seems to be still in db

At a glance, the aggregate error would suggest an issue from reading records from the underlying IPFS blockstore. My hunch is that, for whatever reason, the in-memory blockstore is being unloaded and reloaded resulting in the inability for OrbitDB to read the blocks. Deleting a record may trigger a re-read of the blocks (perhaps for re-indexing).

Can I replicate the issue from deContact? If not, I'll hopefully have some time to try and set up a test to isolate the problem.

I'll try to give you some more info on how to reproduce it:

  1. it seems not reproducible when locally adding values to the documents db and then deleting it
  2. it seems connected with the 'del' operation and syncs between other orbitdb instances.
  3. it is sort of reproducible if I have a "clean" db (Chrome)
    • which was replicated before to another device
    • then dropped
    • new records added
    • replication re-replicates from db of the second device
    • then delete the manually added record (-> crash)
  4. Current Test-Szenario (which should reproduce it - unconfirmed yet) - between Brave and Opera
  • clean db with 2 records on device A
  • replicate it with another device B
  • drop the db on the on device A
  • add a new record on device A
  • wait for replication with device B
  • delete one replicated record (no problem)
  • delete one new record (db crashes and needs to be dropped again)
    (I repeat this procedure is at the time of writing my assumption but not 100% confirmed by another test with other browsers)
  1. After testing No 4 between Brave and Opera this scenario could not be reproduced. But problem is reproducable in my Chrome (maybe its just mine) doesn't make too much sense at the moment.
  2. Re-Test with Chrome with a fresh identity (no previous replication) resulted in a not reproducible scenario like described under No 4. -> Reproducibility for now not possible anymore
  3. Truying some "wild testing" again -> Result: At the moment impossible to crash the replication between 4 different browsers (Chrome, Firefox, Opera, Brave)
  4. Truying to use the same identity of Chrome now on the Android phone in the PWA (so both devices replicate each other)

I can't reproduce it anymore. But I guess it is hiding somewhere around the corner. But until then I'll close the issue to reduce the whole noise it makes. ;)

Just made another test with two Brave browsers (one Mac - one Linux) with a collegue and could reproduce the problem one more time in the following scenario:

  1. I added a record on my Brave (fresh identity with fresh and clean db)
  2. Collegue added a record (fresh identity with fresh and clean db)
  3. College replicated my db and wrote a record into my db
  4. I replicated his db and wrote a record into his db
  5. Replications ok - I deleted his record in my db -> db crash

I will try now to reproduce this with another person! At least this proofs now that my Chrome is not some sort of strangely misconfigured since I used a completely empty Brave for that.

After testing with both @silkroadnomad and the libp2p devs, we are thinking the problem is related to the stream returned by libp2p.dial() prematurely breaking. According to the libp2p devs, the a connection should be re-established but this does not seem to be the case. The workaround suggestion is to provide some kind of keep-alive to ensure two peers are reconnected if the connection prematurely ends.