OrbitDB can't be opened - AggregateError: All promises were rejected
silkroadnomad opened this issue · comments
From time to time in my Chrome browser (and also in an Opera instance) after deleting a record from a document type db, records of the db can't be read anymore. Reloading the browser doesn't help, no records can't be read at all.
An excecption is thrown, after db.all()
"AggregateError: All promises were rejected "
@orbitdb/core 2.1.0
helia 4.1.0
Current Workaround:
- drop db when no peers are connected
- reload browser, so browser reconnects to other peers
- other browser peers with same db are replicating back (restore) - but broken record seems to be still in db
At a glance, the aggregate error would suggest an issue from reading records from the underlying IPFS blockstore. My hunch is that, for whatever reason, the in-memory blockstore is being unloaded and reloaded resulting in the inability for OrbitDB to read the blocks. Deleting a record may trigger a re-read of the blocks (perhaps for re-indexing).
Can I replicate the issue from deContact? If not, I'll hopefully have some time to try and set up a test to isolate the problem.
I'll try to give you some more info on how to reproduce it:
- it seems not reproducible when locally adding values to the documents db and then deleting it
- it seems connected with the 'del' operation and syncs between other orbitdb instances.
- it is sort of reproducible if I have a "clean" db (Chrome)
- which was replicated before to another device
- then dropped
- new records added
- replication re-replicates from db of the second device
- then delete the manually added record (-> crash)
- Current Test-Szenario (which should reproduce it - unconfirmed yet) - between Brave and Opera
- clean db with 2 records on device A
- replicate it with another device B
- drop the db on the on device A
- add a new record on device A
- wait for replication with device B
- delete one replicated record (no problem)
- delete one new record (db crashes and needs to be dropped again)
(I repeat this procedure is at the time of writing my assumption but not 100% confirmed by another test with other browsers)
- After testing No 4 between Brave and Opera this scenario could not be reproduced. But problem is reproducable in my Chrome (maybe its just mine) doesn't make too much sense at the moment.
- Re-Test with Chrome with a fresh identity (no previous replication) resulted in a not reproducible scenario like described under No 4. -> Reproducibility for now not possible anymore
- Truying some "wild testing" again -> Result: At the moment impossible to crash the replication between 4 different browsers (Chrome, Firefox, Opera, Brave)
- Truying to use the same identity of Chrome now on the Android phone in the PWA (so both devices replicate each other)
I can't reproduce it anymore. But I guess it is hiding somewhere around the corner. But until then I'll close the issue to reduce the whole noise it makes. ;)
Just made another test with two Brave browsers (one Mac - one Linux) with a collegue and could reproduce the problem one more time in the following scenario:
- I added a record on my Brave (fresh identity with fresh and clean db)
- Collegue added a record (fresh identity with fresh and clean db)
- College replicated my db and wrote a record into my db
- I replicated his db and wrote a record into his db
- Replications ok - I deleted his record in my db -> db crash
I will try now to reproduce this with another person! At least this proofs now that my Chrome is not some sort of strangely misconfigured since I used a completely empty Brave for that.
After testing with both @silkroadnomad and the libp2p devs, we are thinking the problem is related to the stream returned by libp2p.dial() prematurely breaking. According to the libp2p devs, the a connection should be re-established but this does not seem to be the case. The workaround suggestion is to provide some kind of keep-alive to ensure two peers are reconnected if the connection prematurely ends.