hjmangalam / parsyncfp

follow-on to parsync (parallel rsync) with better startup perf

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Not an issue, but a FYI

fangchin opened this issue · comments

Hi all,

Please review this HPCwire's "off the wire" article: DOE Technical Report: *When to Use rsync?
March 25, 2021 https://bit.ly/2OZqKV7

Regards

Hi Harry,

Thanks for this note. I'll be reading the paper and responding in detail.
If you'd like the conversation to continue on github, do nothing. If you'd
like to continue it in private, my email is widely available.

Thanks for responding. Happy to continue the discussions right here on the github.

First of all, please let me note that as we pointed out in our report Test environment, p. 4 that we had had very tight time for the investigation and highly constrained access to the two employed testbeds - there are other projects waiting for them. Nevertheless, the methodology is precisely described in Test methodology, p. 4 ; the testers are freely available to the public https://github.com/fangchin/test_rsync; and we are confident about the rigorousness, comprehensiveness, automated testing, and fairness employed for the investigation.

As it turns out, I'm working on the multihost version right now and I hope to push it to github in a week or two.

It's our view that any multi-host application must show the linear scalability efficiency defined in the report A glance at two PDDMs, p. 14. Also, by "multi-host", did you mean "scale-out" (i.e. multi-node cluster)? If so, then HA, auto load sharing etc. among multiple instances running on different cluster nodes should be intrinsic. We do hope the our work spurs similar discussions and investigations for other data movers.

I'm surprised you didn't include fpsync, a similar rsync wrapper by Ganael
LaPlanche which supports multihosts already (and who wrote the fpart file
chunker that parsyncfp uses to allow transport to start before the full
file recursion is done.)

I am afraid that a different rsync "wrappers" cannot change the intrinsic limitations of rsync in tackling LOSF, really large files (e.g. hundreds of GBs, multiple TBs), and large RTT values.

In addition, a monograph usually focuses on a single subject. So as the title of the report indicates, it focuses on rsync and a single selected rsync-based tool like parsyncfp (we didn't even have time to evaluate rsync-ssl!). As you alluded, it would be great to include more, other than fpsync, bbcp would be a good one to evaluate for example. Nevertheless, trying as best as we can, we only have 24 hours/day and we have other businesses to take care of :)

Best Regards,

Chin Fang, Zettar Inc.