Redundancy / go-sync

gosync is a library for Golang styled around zsync / rsync, written with the intent that it enables efficient differential file transfer in a number of ways. NB: I am unable to contribute to this at the moment

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Assign severities to TODO items

jsilhan opened this issue · comments

This is follow up from discussion in issue #14. Can you elaborate what go-sync is missing to be production ready regarding the stability and what are the priorities, please? How would you compare it with current version of zsync? We would like to know if we should invest resources into this technology and how many is needed. From TODO in README I would personally classify issues:

Clean up naming consistency and clarity: Block / Chunk etc
Think about turning the filechecksum into an interface

API cleanness - not related the the actual functionality, right?

Flesh out full directory build / sync

Nice to have RFE that upper layer could take care of instead (depends on the scope of this project).

Implement 'patch' payloads from a known start point to a desired end state
Provide bandwidth limiting / monitoring as part of http blocksource

Performance on client side. I would guess medium priority.

Validate full file checksum after patching
Sequential patcher to resume after error?

These would improve the stability. Are they weak points and high priorities?

Avoid marshalling / un-marshalling blocks during checksum generation

Performance on server side. I would guess low priority.

gzip source blocks (this involves writing out a version of the file that's compressed in block-increments)

Performance on client side?

I'll go through these this weekend and try and clean up and add information.

In terms of the major blocker to production usage - as long as you're happy with the functionality available now, my biggest concern is that although the code is tested, it's not hardened against real-life issues like connection failures or data corruption, and the test cases are not extensive for a tool intended to be used across a potentially higher-latency WAN connection. It also has absolutely nothing at the moment for setting file access flags / ownership etc.

While I'm reasonably happy with the library, the CLI interface isn't in any way mature (and it might not communicate failure cases clearly or with the correct error codes).

So, I wouldn't want to currently make any guarantees on the CLI "interface", because it probably needs a lot more attention and changes, which leads me to really want to separate that into its own repository, and use vendoring to properly control the dependencies.

Ultimately, I think it would be easiest to put into production if you were willing to run rsync (potentially) redundantly against the result of gosync for a while in order to establish confidence that it was a no-op consistently.