How to sync with several databases

Question

How to sync with several databases

koh-osug opened this issue 4 months ago · comments

The open function only accepts a single address. How to sync with several databases? Do I have to open several DBs and monitor each individually? Are all the changes from other DBs then synced to the local orbitDB (I name it "FooBar" in the follows)? What if the other DBs are syncing in turn with this orbitDB "FooBar"? Is an infinite replication loop the result or is all data uniquely identified by an hash to prevent this? Will the individual database per node still scale if there are over hundred peers?

Hayden Young · Answer 1 · Mon Jan 15 2024 22:01:17 GMT+0800 (China Standard Time)

Do I have to open several DBs and monitor each individually?

Yes.

const orbitdb = await OrbitDB(...)
const db1 = await orbitdb.open('db1')
const db2 = await orbitdb.open('db2')
// etc

Is an infinite replication loop the result or is all data uniquely identified by an hash to prevent this?

No, it will only replicate records added to other dbs, not its own records which have been replicated on other dbs. The db is an append only log of entries. See https://github.com/orbitdb/orbitdb/blob/main/docs/OPLOG.md if you are interested in the mechanics of the oplog.

Will the individual database per node still scale if there are over hundred peers?

We haven't benchmarked a large number of dbs across a large number of peers but have plans to. However, this is probably more a question about how the underlying IPFS scales (it should scale).

It's natural to misunderstand how p2p replication works. Remember, I might have 1000 peers but if all peers are sync-ing each other's data, I only need to be connected to one other peer to eventually retrieve all records and get to a consistent state. The mantra of a p2p network is "eventual consistency", not "real-time replication" although good uptime and fast nodes can look like "almost real-time replication".

Karsten Ohme · Answer 2 · Mon Jan 15 2024 22:37:00 GMT+0800 (China Standard Time)

Thanks a lot.

Karsten Ohme · Answer 3 · Wed Jan 17 2024 08:50:31 GMT+0800 (China Standard Time)

One follow up here:

It's natural to misunderstand how p2p replication works. Remember, I might have 1000 peers but if all peers are sync-ing each other's data, I only need to be connected to one other peer to eventually retrieve all records and get to a consistent state. The mantra of a p2p network is "eventual consistency", not "real-time replication" although good uptime and fast nodes can look like "almost real-time replication".

How to achieve this? Is this automatically the case or is it necessary to manually create such a sparse network? If I do not pass all known peer addresses and then some of the peers fail, couldn't I get an isolated network partitioning?

Hayden Young · Answer 4 · Thu Jan 18 2024 00:19:00 GMT+0800 (China Standard Time)

How your nodes find one another is reliant on your libp2p configuration. How libp2p peers find each other is up to your requirements; for example, an intranet-only configuration would probably only require mdns for peers to find one another. Peers behinds NATs and needing to connect across the internet would probably need something more complex (or even a mixture of peer discovery mechanisms).

Unfortunately, the process is quite convoluted and as such a custom approach that meets the needs of the software is required. However, we are working on simplifying this for 3rd party developers using preconfigured solutions and maybe even configuration tools. At this stage it is a question of time, money and resources.

Karsten Ohme · Answer 5 · Thu Jan 18 2024 05:24:40 GMT+0800 (China Standard Time)

Thanks, also always great answer.