janmonschke / diffsync

Enables real-time collaborative editing of arbitrary JSON objects

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Running the diffsync server in a browser

seidtgeist opened this issue · comments

Bullet points because I can't brain right now:

  • I really, really, really want collaborative real time documents in p2p
  • I am aware of awesome merkle-dag/crdt based approaches, but diffsync already works and is proven, however it requires a server peer
  • CENTRALIZED-server-less diffsync! This is huge because
  • The win is that nobody would have to run a diffsync instance on heroku or a VPS
  • Instead we would facilitate already existing, necessary and free client hardware to establish networks
  • I haven't tried this. It might already work. yolo!
  • Client peers connect to the server peer via WebRTC (as usual aided by a signaling server?)
  • The server peer connects to itself as a client
  • If server fails a new server is negotiated

cc @diasdavid

I really don't think it's a problem to have a server peer, and there might be ways to get around that (can operations converge @janmonschke?). The beauty of this is that any client could be an impromptu server peer, and I can't find reasons what would be wrong with that. In that way diffsync sessions are ephemeral. Let's imagine we pair diffsync collaboration with a content adressed store:

  1. Request JSON value for hash
  2. Open diffsync server peer with value of hash
  3. Have diffsync clients connect to server peer and collaborate on hash (see gist for how simple this is)
  4. Periodically, or on demand, save new values to cas store
  5. When server dies, negotiate new server peer
  6. Session ends when nobody wants to edit anymore

In theory, all that sounds doable and even Neil Fraser also thought about a P2P system when he introduced the algorithm in a Google Tech Talk -> https://youtu.be/S2Hp_1jqpY8?t=2591. His version of the implementation however was based on pulling data periodically rather than having a dedicated signal for updated versions like diffsync has. So he concluded that the sync-delay between those nodes could increase significantly. I don't hink this latency problem applies here.

My knowledge of P2P systems and WebRTC is pretty limited so I'm not sure if I would be able to implement it on my own. Especially regarding systems that negotiate a single source of truth in P2P environments. I don't know how the death of a node can be handled gracefully so that no data is lost. Also: consider that a node could be determined as the server-node which is actually run on a mobile phone. Handling all diffsync sessions could drain the battery a lot and the signal might get lost at any given time.

Don't get me wrong, I'm not against the idea, I'm just unsure how P2P systems handle the described cases ;)

Another point to consider would be if P2P should be part of the core package. I personally would say it should be it's own module that is implemented on top of an enhanced transport interface -> https://github.com/janmonschke/diffsync#socketio-independence.

The difficult thing is not the transport layer. It is managing a distributed system on a network topology. This is why Neil said that it is a different problem from the synchronization algo itself. He did not intent to say that diffSync is a peer to peer synchronization algorithm, because that requires more than diffsync itself.

The problem is to manage the interconnect between clients, and figure out a way to propagate this global topology information to each client, and be able to tolerate changes.

A simple algo is to do client hopping. I am planning on doing something fun in this direction when I got more time. It may be never.

Are you guys familiar with IRC? The graph theory part of IRC is relevant.

Cheers.