hjmangalam / parsyncfp

follow-on to parsync (parallel rsync) with better startup perf

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Check for fparts_already_running warns about unrelated fpart processes and does not respect --nowait

novosirj opened this issue · comments

I run parsyncfp from a script. For the first time this week, we ran into a scenario where we wanted to run two of them at once, requiring us to use --altcache to separate the cache directories. There are some minor bugs in this functionality. For one:

ls: cannot access /root/.parsyncfp-backups/: No such file or directory

[root@quorum ~]# /usr/local/bin/parsyncfp --altcache '/root/.parsyncfp-backups' -NP 4 --chunksize=100G --rsyncopts '--stats -e "ssh -T -c arcfour -o Compression=no -x"' --maxbw=1250000 --startdir '/gpfs/home/.snapshots' home nas1:/zdata/gss

  WARN: about to remove all the old cached chunkfiles from [/root/.parsyncfp-backups/fpcache].
  Enter ^C to stop this.
        If you specified '--nowait', cache will be cleared in 3s regardless.
  Otherwise, hit [Enter] and I'll clear them.
Press [ENTER] to continue.

However, as we've seen, there were no old cachefiles there. The cause is the way that altcache works -- it creates the directory at the time when it sets $parsync_dir on lines 83-84:

if (! defined $ALTCACHE) {$parsync_dir = $HOME . "/.parsyncfp";} else {$parsync_dir = $ALTCACHE; mkdir $parsync_dir;}

That can be avoided with --nowait; it still does a slightly wrong thing, but it doesn't torpedo the script. And really, I like running without --nowait because it catches my errors re: not removing the cachefiles or other unusual circumstances.

The part that does really cause a problem though is the section where fparts_already_running is checked. Even though an alternate cache directory is called out, the check for fparts does not try to exclude any fparts from the process list, on line 265:

my $fparts_already_running = `ps aux | grep 'fpar[t]'`; chomp $fparts_already_running;

I guess my inclination would be to add a second grep along the lines of the following:

my $fparts_already_running = `ps aux | grep 'fpar[t]' | grep -- "-o $parsync_dir/"`; chomp $fparts_already_running;

Though I'm reluctant to just suggest this as a patch without having read more of the code. HTH, anyway!

It remains to be seen whether it would actually be faster to run two of these at once vs. just running them sequentially, given the fighting over resources that the two fparts are doing, but it would be good to have it behave properly.

Will test this tonight on our next round of backups. Thanks!

This seems to have worked well this time, and when the final rsync ended, so did parsyncfp. Did you make any changes to the PID code, or is that just good luck?

Thanks.