upa / mscp

mscp: transfer files over multiple SSH (SFTP) connections

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

noclobber option

gerbenvoshol opened this issue · comments

Hi,

Normally we use rsync and one of the advantages it the resume (already mentioned in another issue) option. Another is the option to not overwrite files if they exist. Would it be possible for you to add a --noclobber option to mscp?

Thanks!

I think implementing such an option specific to particular use cases would be a starting point for a mess. I don't intend to implement rsync -a into mscp. scp and rsync are different programs because they focus on different things: transferring files and synchronizing directory trees. If I intended to implement a fast backup tool, I would choose a different design.

To enhance mscp, I need a strategy to cover broader cases before implementing such an option. Could you describe your use case? Is --noclobber enough? (I guess it's not)

I understand your position and will try and give a standard use case that we frequently encounter and are looking at this tool for. We are currently implementing whole genome sequencing for diagnostics. This is approximately 120 Gbyte per patient and we aim to initially run batches of 63 samples. After analysis all this data needs to be shared securely with the participating medical centers. For this we currently use ssh/rsync transfers. However the speed through firewalls and the overall transfer speeds are too slow to transfer all this data. Therefore we are exploring alternatives and your tool looks very promising. One thing that I am worried about is transfer stability possibly leading to corrupt or partial transfers (after which a transfer would need to be restarted or resumed). Hope this helps for our use case, which as I understand might not necessary be yours.

Thank you for describing your use case. Transferring large amounts of genome/health/observed data is also common in my univ :).

One thing that I am worried about is transfer stability possibly leading to corrupt or partial transfers (after which a transfer would need to be restarted or resumed).

I understand that the first priority is the resume feature, for example, when all SSH connections fail, mscp writes out bytes and positions of transferred data to a checkpoint file, and mscp can restart the transfer from the checkpoint. --noclobber option will not resolve your concern. It will not overwrite remote files, although they are corrupted due to the last transfer failing.

In addition, could you please tell me which rysnc options you use? I don't intend to implement all the rsync features, but a subset of them would be acceptable. I'd like to know what the options are crucial.

I will lookup our current transfer options and get back to you.

About the no-clobber, what rsync does for example is transfer to a temp file a .filename and "simply" moves it to the finale filename after successful transfer.

I will lookup our current transfer options and get back to you.

Thank you!

About the no-clobber, what rsync does for example is transfer to a temp file a .filename and "simply" moves it to the finale filename after successful transfer.

I see.

Anyway, I have implemented the resume option on the dev branch (not merged into main yet). Could you please see the doc and check -W and -R options and the Example section? Comments are welcome.

Thanks for sharing! Sounds like a nice option that solves resuming after an interrupted transfer.

Found a few typos:
resumeing - resuming
CHECKOPOINT - CHECKPOINT
chunkes - chunks
Infromation - information
pathces - patches

As to the rsync, right now we use the following option:
rsync -vzrta --perms --exclude 'Checkpoints.txt' --chmod='Du=rwx,Dg=rwsx,Fu=rw,Fg=rw,o-rwx' --progress

The permissions we set to make sure the receiving party can read and remove the files after the transfer. The exclude is used to exclude a file which is used as a transfer completed signal (in the example above it is the Checkpoints.txt file).

Thanks! v0.1.5, including the feature, is released.

Also, thanks for sharing the rsync options. I will consider whether to implement them. If you find any vital options (or features) that will be applicable to broader use cases, please open a new issue.