Issue with replication
pfrazee opened this issue · comments
Repro repo: https://github.com/pfrazee/hyper-replication-bug
Something is failing in the replication code causing the program to end prematurely and without any information.
What's happening in the repro repo is that I'm creating a set of "Nodes" which are corestores, a set of cores, and autobases. I'm then randomly connecting and disconnecting them using core.replicate(), and also creating put() or del() operations. There are no reads occurring yet, so the rebased-hypercore index isnt being touched yet (no apply calls).
The failure seems to occur during an oplog append, and I traced it as follows:
- head() is calling _getInputNode() on a remote core,
- which in turn is calling core.get().
- That cache misses,
- so it calls out to the replicator which is creating a request.
At that point, you get into the complexities of the request code and I figured it'd be better to pass this off to yall.
Another interesting piece of info: if I connect all of my "nodes" corestores in replication, then the error doesn't occur
Can you reopen this on autobase? I'll comment here for now. This is because append calls heads which wants to replicate the data.
- This should error with "block not available" (but haven't landed that yet, which is why it hangs)
- Autobase prob NOT should fetch it all for an append
- Your test don't replicate writers with eachother, not sure if that's intentional? but that's why it bails.
Moved it over.
Your test don't replicate writers with eachother, not sure if that's intentional? but that's why it bails.
It does but it's intentionally connecting various subsets of the corestores to simulate various network conditions
Closing to keep it tidy here.