parsyncfp 1.67 and earlier sometimes exits before all rsync processes have completed
novosirj opened this issue · comments
Here's some output from our most recent run on one of our three campuses; I mentioned in Issue 34 that there seems to be a bug where sometimes PFP exits while rsync processes are still running, which is incorrect. You can see in the below output that PFP exists, and then there are these lines:
* TRANSFER projects parallel rsync wait BEGIN: 2020-05-12_23:41:59
* TRANSFER projects parallel rsync wait END: 2020-05-13_03:20:07
What is happening during that time is a pgrep -x rsync
loop to ensure that there are no longer any rsyncs running that was added by my colleague Bill Abbott, probably to deal with this problem. As you can see, it was several hours before all of the rsync
processes completed. Here's the full output:
* TRANSFER projects parallel BEGIN: 2020-05-12_23:01:04
^[[0m^[[1;35mWARN: About to remove all the old cached chunkfiles from [/root/.parsyncfp-backups-projectsc/fpcache].
Enter ^C to stop this.
If you specified '--nowait', cache will be cleared in 3s regardless.
Otherwise, hit [Enter] and I'll clear them.
^[[0m^[[1;34mINFO: The fpart chunk files [/root/.parsyncfp-backups-projectsc/fpcache/f*] are cleared .. continuing.
^[[0m^[[1;34mINFO: The TRIMPATH you specified exists, is a dir, and is readable.
^[[0m^[[1;34mINFO: Alternative file list is readable; converting list to chunks.
^[[0m^[[1;34mINFO: Forking fpart with PID = [14688]. Check [/root/.parsyncfp-backups-projectsc/fpcache/fpart.log.23.01.04_2020-05-12] for errors if it hangs.
^[[0m^[[1;34mINFO: Starting the 1st [12] rsyncs ..
^[[0m^[[1;34mINFO: Starting rsync for chunkfile [/root/.parsyncfp-backups-projectsc/fpcache/f.0]..
^[[0m^[[1;34mINFO: Starting rsync for chunkfile [/root/.parsyncfp-backups-projectsc/fpcache/f.1]..
^[[0m^[[1;34mINFO: Starting rsync for chunkfile [/root/.parsyncfp-backups-projectsc/fpcache/f.2]..
^[[0m^[[1;34mINFO: Starting rsync for chunkfile [/root/.parsyncfp-backups-projectsc/fpcache/f.3]..
^[[0m^[[1;34mINFO: Starting rsync for chunkfile [/root/.parsyncfp-backups-projectsc/fpcache/f.4]..
^[[0m^[[1;34mINFO: Starting rsync for chunkfile [/root/.parsyncfp-backups-projectsc/fpcache/f.5]..
^[[0m^[[1;34mINFO: Starting rsync for chunkfile [/root/.parsyncfp-backups-projectsc/fpcache/f.6]..
^[[0m^[[1;34mINFO: Starting rsync for chunkfile [/root/.parsyncfp-backups-projectsc/fpcache/f.7]..
^[[0m^[[1;34mINFO: Starting rsync for chunkfile [/root/.parsyncfp-backups-projectsc/fpcache/f.8]..
^[[0m^[[1;34mINFO: Starting rsync for chunkfile [/root/.parsyncfp-backups-projectsc/fpcache/f.9]..
^[[0m^[[1;34mINFO: Starting rsync for chunkfile [/root/.parsyncfp-backups-projectsc/fpcache/f.10]..
^[[0m^[[1;34mINFO: Starting rsync for chunkfile [/root/.parsyncfp-backups-projectsc/fpcache/f.11]..
^[[0m | Elapsed | 1m | [ ens6] MB/s | Running || Susp'd | Chunks [2020-05-12]
Time | time(m) | Load | TCP / RDMA out | PIDs || PIDs | [UpTo] of [ToDo]
23.11.52 40.67 6.41 320.95 / 0.00 12 <> 0 [27] of [32]
^[[1;34mINFO: Done. Please check the target to make sure
expected files are where they're supposed to be.
^[[0m^[[1;34mINFO:
The parsyncfp cache dir takes up [277M /root/.parsyncfp-backups-projectsc]
Don't forget to delete it, but wait until you are sure that your job
completed correctly, so you don't need the log files anymore.
Reminder: check the parsyncfp log:
[/root/.parsyncfp-backups-projectsc/rsync-logfile-23.01.04_2020-05-12_31]
and the fpart log:
[/root/.parsyncfp-backups-projectsc/fpcache/fpart.log.23.01.04_2020-05-12]
for errors. Use '--verbose=1' for less output.
You transferred [1.28184e+14 bytes = 116.58267 TB] via all [12] rsyncs.
Thanks for using parsyncfp. Tell me how to make it better.
<hjmangalam@gmail.com>
^[[0m
* TRANSFER projects parallel END: 2020-05-12_23:41:54
* TRANSFER projects parallel rsync wait BEGIN: 2020-05-12_23:41:59
* TRANSFER projects parallel rsync wait END: 2020-05-13_03:20:07
While this rsync
loop was running, I checked for running rsync processes:
[root@quorum03 ~]# ps -ef | grep rsyc
root 7548 4342 0 00:09 pts/1 00:00:00 grep --color=auto rsyc
[root@quorum03 ~]# ps -ef | grep rsync
root 4151 1 0 May12 ? 00:00:00 sh -c cd /projects/.snapshots/projectsc && rsync --bwlimit=1000000 -a -s --log-file=/root/.parsyncfp-backups-projectsc/rsync-logfile-23.01.04_2020-05-12_27 -a -e "ssh -x -c aes128-gcm@openssh.com -o Compression=no"
--files-from=/root/.parsyncfp-backups-projectsc/fpcache/f.27 /projects/.snapshots/projectsc nas1:/zdata/gss/projectsc & echo "${!}" >> /root/.parsyncfp-backups-projectsc/fpcache/rsync-PIDs-23.01.04_2020-05-12
root 4153 4151 45 May12 ? 00:12:47 rsync --bwlimit=1000000 -a -s --log-file=/root/.parsyncfp-backups-projectsc/rsync-logfile-23.01.04_2020-05-12_27 -a -e ssh -x -c aes128-gcm@openssh.com -o Compression=no --files-from=/root/.parsyncfp-backups-project
sc/fpcache/f.27 /projects/.snapshots/projectsc nas1:/zdata/gss/projectsc
root 4158 4153 33 May12 ? 00:09:17 ssh -x -c aes128-gcm@openssh.com -o Compression=no nas1 rsync --server -slogDtpRe.LsfxC
root 4164 1 0 May12 ? 00:00:00 sh -c cd /projects/.snapshots/projectsc && rsync --bwlimit=1000000 -a -s --log-file=/root/.parsyncfp-backups-projectsc/rsync-logfile-23.01.04_2020-05-12_28 -a -e "ssh -x -c aes128-gcm@openssh.com -o Compression=no"
--files-from=/root/.parsyncfp-backups-projectsc/fpcache/f.28 /projects/.snapshots/projectsc nas1:/zdata/gss/projectsc & echo "${!}" >> /root/.parsyncfp-backups-projectsc/fpcache/rsync-PIDs-23.01.04_2020-05-12
root 4166 4164 43 May12 ? 00:12:15 rsync --bwlimit=1000000 -a -s --log-file=/root/.parsyncfp-backups-projectsc/rsync-logfile-23.01.04_2020-05-12_28 -a -e ssh -x -c aes128-gcm@openssh.com -o Compression=no --files-from=/root/.parsyncfp-backups-project
sc/fpcache/f.28 /projects/.snapshots/projectsc nas1:/zdata/gss/projectsc
root 4167 4166 31 May12 ? 00:08:44 ssh -x -c aes128-gcm@openssh.com -o Compression=no nas1 rsync --server -slogDtpRe.LsfxC
root 4177 1 0 May12 ? 00:00:00 sh -c cd /projects/.snapshots/projectsc && rsync --bwlimit=1000000 -a -s --log-file=/root/.parsyncfp-backups-projectsc/rsync-logfile-23.01.04_2020-05-12_29 -a -e "ssh -x -c aes128-gcm@openssh.com -o Compression=no"
--files-from=/root/.parsyncfp-backups-projectsc/fpcache/f.29 /projects/.snapshots/projectsc nas1:/zdata/gss/projectsc & echo "${!}" >> /root/.parsyncfp-backups-projectsc/fpcache/rsync-PIDs-23.01.04_2020-05-12
root 4179 4177 44 May12 ? 00:12:23 rsync --bwlimit=1000000 -a -s --log-file=/root/.parsyncfp-backups-projectsc/rsync-logfile-23.01.04_2020-05-12_29 -a -e ssh -x -c aes128-gcm@openssh.com -o Compression=no --files-from=/root/.parsyncfp-backups-project
sc/fpcache/f.29 /projects/.snapshots/projectsc nas1:/zdata/gss/projectsc
root 4180 4179 30 May12 ? 00:08:38 ssh -x -c aes128-gcm@openssh.com -o Compression=no nas1 rsync --server -slogDtpRe.LsfxC
root 4189 1 0 May12 ? 00:00:00 sh -c cd /projects/.snapshots/projectsc && rsync --bwlimit=1000000 -a -s --log-file=/root/.parsyncfp-backups-projectsc/rsync-logfile-23.01.04_2020-05-12_30 -a -e "ssh -x -c aes128-gcm@openssh.com -o Compression=no" --files-from=/root/.parsyncfp-backups-projectsc/fpcache/f.30 /projects/.snapshots/projectsc nas1:/zdata/gss/projectsc & echo "${!}" >> /root/.parsyncfp-backups-projectsc/fpcache/rsync-PIDs-23.01.04_2020-05-12
root 4191 4189 45 May12 ? 00:12:37 rsync --bwlimit=1000000 -a -s --log-file=/root/.parsyncfp-backups-projectsc/rsync-logfile-23.01.04_2020-05-12_30 -a -e ssh -x -c aes128-gcm@openssh.com -o Compression=no --files-from=/root/.parsyncfp-backups-projectsc/fpcache/f.30 /projects/.snapshots/projectsc nas1:/zdata/gss/projectsc
root 4192 4191 29 May12 ? 00:08:15 ssh -x -c aes128-gcm@openssh.com -o Compression=no nas1 rsync --server -slogDtpRe.LsfxC
root 4201 1 0 May12 ? 00:00:00 sh -c cd /projects/.snapshots/projectsc && rsync --bwlimit=1000000 -a -s --log-file=/root/.parsyncfp-backups-projectsc/rsync-logfile-23.01.04_2020-05-12_31 -a -e "ssh -x -c aes128-gcm@openssh.com -o Compression=no" --files-from=/root/.parsyncfp-backups-projectsc/fpcache/f.31 /projects/.snapshots/projectsc nas1:/zdata/gss/projectsc & echo "${!}" >> /root/.parsyncfp-backups-projectsc/fpcache/rsync-PIDs-23.01.04_2020-05-12
root 4203 4201 41 May12 ? 00:11:35 rsync --bwlimit=1000000 -a -s --log-file=/root/.parsyncfp-backups-projectsc/rsync-logfile-23.01.04_2020-05-12_31 -a -e ssh -x -c aes128-gcm@openssh.com -o Compression=no --files-from=/root/.parsyncfp-backups-projectsc/fpcache/f.31 /projects/.snapshots/projectsc nas1:/zdata/gss/projectsc
root 4204 4203 25 May12 ? 00:07:13 ssh -x -c aes128-gcm@openssh.com -o Compression=no nas1 rsync --server -slogDtpRe.LsfxC
root 7584 4342 0 00:09 pts/1 00:00:00 grep --color=auto rsync
root 11262 11259 0 May12 ? 00:00:00 /bin/bash -x /usr/local/bin/asb-parsyncfp-0.9.sh
root 14280 1 0 May05 ? 00:00:29 perl /usr/local/bin/parsyncfp-1.67 -i ens6 --checkperiod 1800 --nowait --altcache /root/.parsyncfp-backups-$SNAPSHOT --dispose c -NP 12 --rsyncopts -a -e "ssh -x -c aes128-gcm@openssh.com -o Compression=no" --maxload 96 --chunksize=10T --fromlist=/gpfs/cache/home/root/policy/projectsc.list.allfiles.clean --trimpath=/projects/.snapshots/projectsc --trustme nas1:/zdata/gss/projectsc
root 25307 6185 0 May12 pts/0 00:00:00 tail -f /var/log/asb-parsyncfp-20200512.log /var/log/asb-parsyncfp-20200512.err
[root@quorum03 ~]# ps -ef | grep rsync | grep f\...
root 4151 1 0 May12 ? 00:00:00 sh -c cd /projects/.snapshots/projectsc && rsync --bwlimit=1000000 -a -s --log-file=/root/.parsyncfp-backups-projectsc/rsync-logfile-23.01.04_2020-05-12_27 -a -e "ssh -x -c aes128-gcm@openssh.com -o Compression=no" --files-from=/root/.parsyncfp-backups-projectsc/fpcache/f.27 /projects/.snapshots/projectsc nas1:/zdata/gss/projectsc & echo "${!}" >> /root/.parsyncfp-backups-projectsc/fpcache/rsync-PIDs-23.01.04_2020-05-12
root 4153 4151 45 May12 ? 00:13:15 rsync --bwlimit=1000000 -a -s --log-file=/root/.parsyncfp-backups-projectsc/rsync-logfile-23.01.04_2020-05-12_27 -a -e ssh -x -c aes128-gcm@openssh.com -o Compression=no --files-fro =/root/.parsyncfp-backups-projectsc/fpcache/f.27 /projects/.snapshots/projectsc nas1:/zdata/gss/projectsc
root 4164 1 0 May12 ? 00:00:00 sh -c cd /projects/.snapshots/projectsc && rsync --bwlimit=1000000 -a -s --log-file=/root/.parsyncfp-backups-projectsc/rsync-logfile-23.01.04_2020-05-12_28 -a -e "ssh -x -c aes128-gcm@openssh.com -o Compression=no" --files-from=/root/.parsyncfp-backups-projectsc/fpcache/f.28 /projects/.snapshots/projectsc nas1:/zdata/gss/projectsc & echo "${!}" >> /root/.parsyncfp-backups-projectsc/fpcache/rsync-PIDs-23.01.04_2020-05-12
root 4166 4164 43 May12 ? 00:12:42 rsync --bwlimit=1000000 -a -s --log-file=/root/.parsyncfp-backups-projectsc/rsync-logfile-23.01.04_2020-05-12_28 -a -e ssh -x -c aes128-gcm@openssh.com -o Compression=no --files-fro =/root/.parsyncfp-backups-projectsc/fpcache/f.28 /projects/.snapshots/projectsc nas1:/zdata/gss/projectsc
root 4177 1 0 May12 ? 00:00:00 sh -c cd /projects/.snapshots/projectsc && rsync --bwlimit=1000000 -a -s --log-file=/root/.parsyncfp-backups-projectsc/rsync-logfile-23.01.04_2020-05-12_29 -a -e "ssh -x -c aes128-gcm@openssh.com -o Compression=no" --files-from=/root/.parsyncfp-backups-projectsc/fpcache/f.29 /projects/.snapshots/projectsc nas1:/zdata/gss/projectsc & echo "${!}" >> /root/.parsyncfp-backups-projectsc/fpcache/rsync-PIDs-23.01.04_2020-05-12
root 4179 4177 44 May12 ? 00:12:51 rsync --bwlimit=1000000 -a -s --log-file=/root/.parsyncfp-backups-projectsc/rsync-logfile-23.01.04_2020-05-12_29 -a -e ssh -x -c aes128-gcm@openssh.com -o Compression=no --files-fro =/root/.parsyncfp-backups-projectsc/fpcache/f.29 /projects/.snapshots/projectsc nas1:/zdata/gss/projectsc
root 4189 1 0 May12 ? 00:00:00 sh -c cd /projects/.snapshots/projectsc && rsync --bwlimit=1000000 -a -s --log-file=/root/.parsyncfp-backups-projectsc/rsync-logfile-23.01.04_2020-05-12_30 -a -e "ssh -x -c aes128-gcm@openssh.com -o Compression=no" --files-from=/root/.parsyncfp-backups-projectsc/fpcache/f.30 /projects/.snapshots/projectsc nas1:/zdata/gss/projectsc & echo "${!}" >> /root/.parsyncfp-backups-projectsc/fpcache/rsync-PIDs-23.01.04_2020-05-12
root 4191 4189 45 May12 ? 00:13:05 rsync --bwlimit=1000000 -a -s --log-file=/root/.parsyncfp-backups-projectsc/rsync-logfile-23.01.04_2020-05-12_30 -a -e ssh -x -c aes128-gcm@openssh.com -o Compression=no --files-fro =/root/.parsyncfp-backups-projectsc/fpcache/f.30 /projects/.snapshots/projectsc nas1:/zdata/gss/projectsc
root 4201 1 0 May12 ? 00:00:00 sh -c cd /projects/.snapshots/projectsc && rsync --bwlimit=1000000 -a -s --log-file=/root/.parsyncfp-backups-projectsc/rsync-logfile-23.01.04_2020-05-12_31 -a -e "ssh -x -c aes128-gcm@openssh.com -o Compression=no" --files-from=/root/.parsyncfp-backups-projectsc/fpcache/f.31 /projects/.snapshots/projectsc nas1:/zdata/gss/projectsc & echo "${!}" >> /root/.parsyncfp-backups-projectsc/fpcache/rsync-PIDs-23.01.04_2020-05-12
root 4203 4201 41 May12 ? 00:12:02 rsync --bwlimit=1000000 -a -s --log-file=/root/.parsyncfp-backups-projectsc/rsync-logfile-23.01.04_2020-05-12_31 -a -e ssh -x -c aes128-gcm@openssh.com -o Compression=no --files-fro =/root/.parsyncfp-backups-projectsc/fpcache/f.31 /projects/.snapshots/projectsc nas1:/zdata/gss/projectsc
root 11262 11259 0 May12 ? 00:00:00 /bin/bash -x /usr/local/bin/asb-parsyncfp-0.9.sh
root 14280 1 0 May05 ? 00:00:29 perl /usr/local/bin/parsyncfp-1.67 -i ens6 --checkperiod 1800 --nowait --altcache /root/.parsyncfp-backups-$SNAPSHOT --dispose c -NP 12 --rsyncopts -a -e "ssh -x -c aes128-gcm@openssh.com -o Compression=no" --maxload 96 --chunksize=10T --fromlist=/gpfs/cache/home/root/policy/projectsc.list.allfiles.clean --trimpath=/projects/.snapshots/projectsc --trustme nas1:/zdata/gss/projectsc
root 25307 6185 0 May12 pts/0 00:00:00 tail -f /var/log/asb-parsyncfp-20200512.log /var/log/asb-parsyncfp-20200512.err
[root@quorum03 ~]# ps -ef | grep rsync | grep f\.3.
root 4189 1 0 May12 ? 00:00:00 sh -c cd /projects/.snapshots/projectsc && rsync --bwlimit=1000000 -a -s --log-file=/root/.parsyncfp-backups-projectsc/rsync-logfile-23.01.04_2020-05-12_30 -a -e "ssh -x -c aes128-gcm@openssh.com -o Compression=no" --files-from=/root/.parsyncfp-backups-projectsc/fpcache/f.30 /projects/.snapshots/projectsc nas1:/zdata/gss/projectsc & echo "${!}" >> /root/.parsyncfp-backups-projectsc/fpcache/rsync-PIDs-23.01.04_2020-05-12
root 4191 4189 45 May12 ? 00:13:09 rsync --bwlimit=1000000 -a -s --log-file=/root/.parsyncfp-backups-projectsc/rsync-logfile-23.01.04_2020-05-12_30 -a -e ssh -x -c aes128-gcm@openssh.com -o Compression=no --files-from=/root/.parsyncfp-backups-projectsc/fpcache/f.30 /projects/.snapshots/projectsc nas1:/zdata/gss/projectsc
root 4201 1 0 May12 ? 00:00:00 sh -c cd /projects/.snapshots/projectsc && rsync --bwlimit=1000000 -a -s --log-file=/root/.parsyncfp-backups-projectsc/rsync-logfile-23.01.04_2020-05-12_31 -a -e "ssh -x -c aes128-gcm@openssh.com -o Compression=no" --files-from=/root/.parsyncfp-backups-projectsc/fpcache/f.31 /projects/.snapshots/projectsc nas1:/zdata/gss/projectsc & echo "${!}" >> /root/.parsyncfp-backups-projectsc/fpcache/rsync-PIDs-23.01.04_2020-05-12
root 4203 4201 41 May12 ? 00:12:05 rsync --bwlimit=1000000 -a -s --log-file=/root/.parsyncfp-backups-projectsc/rsync-logfile-23.01.04_2020-05-12_31 -a -e ssh -x -c aes128-gcm@openssh.com -o Compression=no --files-from=/root/.parsyncfp-backups-projectsc/fpcache/f.31 /projects/.snapshots/projectsc nas1:/zdata/gss/projectsc
As you can see, they are numerous.
We still have an error message in the fpart.log
-- it's not clear to me whether it's related, or how I would go about figuring out what is upsetting it:
[root@quorum03 ~]# cat /root/.parsyncfp-backups-projectsc/fpcache/fpart.log.23.01.04_2020-05-12
Examining filesystem...
Filled part #0: size = 5522565562195, 39213 file(s)
Filled part #1: size = 5541427793660, 97253 file(s)
Filled part #2: size = 5545874130628, 60812 file(s)
Filled part #3: size = 5509769400795, 40104 file(s)
Filled part #4: size = 5499888680149, 56216 file(s)
Filled part #5: size = 5506487195394, 39990 file(s)
Filled part #6: size = 5506709107196, 30690 file(s)
Filled part #7: size = 5506735624280, 24304 file(s)
Filled part #8: size = 5499734697022, 49527 file(s)
Filled part #9: size = 5508541016486, 39374 file(s)
Filled part #10: size = 5497866496719, 55087 file(s)
Filled part #11: size = 6835297759526, 11085 file(s)
Filled part #12: size = 5497819066318, 28734 file(s)
Filled part #13: size = 5511722393336, 58762 file(s)
Filled part #14: size = 5505785371697, 19059 file(s)
Filled part #15: size = 5504441490693, 48614 file(s)
Filled part #16: size = 5512316116286, 39688 file(s)
Filled part #17: size = 5500649696382, 41383 file(s)
Filled part #18: size = 5512471694650, 52756 file(s)
Filled part #19: size = 5553281642311, 112159 file(s)
Filled part #20: size = 5501550103865, 54165 file(s)
Filled part #21: size = 5508397812239, 57320 file(s)
Filled part #22: size = 5577036109281, 23500 file(s)
Filled part #23: size = 5499498469383, 62866 file(s)
Filled part #24: size = 5538987318102, 21086 file(s)
Filled part #25: size = 5498235348656, 21008 file(s)
Filled part #26: size = 5498119523024, 29184 file(s)
Filled part #27: size = 5497635536928, 21893 file(s)
Filled part #28: size = 5501677782525, 44983 file(s)
Filled part #29: size = 5500261233014, 63159 file(s)
Filled part #30: size = 5502396364451, 52938 file(s)
error parsing input values:
Filled part #31: size = 4576145248049, 96973 file(s)
1493885 file(s) found.
But we have all of the cache files from the run, so we can look them over and run tests in the interim (the transfer finished very fast this week so I have till Tuesday night to tinker).
Yes, the child rsyncs are absolutely children of the PFP process that's still running. We don't run multiples at all on this system, and only have been running one at a time otherwise on the system that does run more than one, and you can see that the command lines for the rsync
processes above correspond to the parsyncfp
command line.
I'll be starting up a new round tomorrow night. It looks like the critical line here is 928, so if there's some instrumentation it would be helpful to add here, let me know.
Re: fpart
, yes, we use the PFP option to read file sizes from a list. Here is our full command line for PFP:
/usr/local/bin/parsyncfp-1.67 -i ens6 --checkperiod 1800 --nowait --altcache /root/.parsyncfp-backups-$SNAPSHOT --dispose c -NP 12 --rsyncopts '-a -e "ssh -x -c aes128-gcm@openssh.com -o Compression=no"' --maxload 96 --chunksize=5T --fromlist=$HOMEDIR/$SNAPSHOT.list.allfiles.clean --trimpath=/$MOUNTPOINT/.snapshots/$SNAPSHOT --trustme $BACKUPHOST:/zdata/gss/$SNAPSHOT
(I notice --dispose c
doesn't seem to work either, but maybe I'm not specifying that correctly?)
I do know that we have some filenames with junk in them -- mostly carriage returns at the end. I tried what you suggested on the fpart log and he's right:
[root@quorum03 .parsyncfp-backups-projectsc]# cat -bet fpcache/fpart.log.23.01.04_2020-05-12 | grep error
33 error parsing input values: ^I$
Do you know which part should be the problem part? The output makes it unclear whether it's the one before or after, but neither the f.30
file or f.31
seem to contain a ^I
:
...
32 Filled part #30: size = 5502396364451, 52938 file(s)$
33 error parsing input values: ^I$
34 Filled part #31: size = 4576145248049, 96973 file(s)$
Thanks for your assistance!
I'll add prints of those PIDs, etc. before I run this again.
When this has gone well, I typically run my script via at
, and have a &
on the invocation of the script. On weeks where I'm manually waiting for last week's run to finish, or I'm trying to keep a closer eye on this, I'll run it like myscript.sh &
and then run disown %%
so that if my shell is dropped, the command will not. I suspect that may be similar to what you're seeing. I also notice that it's not typical that the rsync
processes will die if kill PFP, so that seems to agree. We don't run PFP itself inside that script with an &
or anything
I know at one time there was a bug where PFP wasn't careful to confirm that one of the rsync
PIDs wasn't reused by something else. Since this runs for several days on our system -- I believe this week Tuesday at 23:00 to sometime late Saturday -- there is more chance for that to happen. But I believe you already made changes in that area.
We'll see what it prints out this go 'round.
Re: the fpart
error, the filenames don't appear to crash rsync
. Another interesting thing that happens in this area (probably better for another ticket -- I only raised the fpart
problem at all in case it related to this early exit problem somehow) is that the rsync
processes that run within PFP complain about problem filenames, whereas the final rsync
that we run with --delete
does not. But I think the reason there is that somewhere upstream -- likely the fpart
step -- is dropping special characters, and the file doesn't exist at a name that does not contain them. The final rsync
is silent on these/does appear to transfer them.
This didn't happen on this run, though I ran it via at
without the &
. I may go back to the other way next week to see if that exposes whatever is going on.
Here's the output I did capture, at any rate:
Time to debug weird exit:
rPIDs: 14315 22586
sPIDs:
CUR_FPI: 33
nbr_cur_fpc_fles: 33
FPART_RUNNING: 0
Time to debug weird exit:
rPIDs: 14315 22586
sPIDs:
CUR_FPI: 33
nbr_cur_fpc_fles: 33
FPART_RUNNING: 0
02.54.30 262.62 4.96 60.26 / 0.00 2 <> 0 [33] of [33]
Time to debug weird exit:
rPIDs:
sPIDs:
CUR_FPI: 33
nbr_cur_fpc_fles: 33
FPART_RUNNING: 0
^[[1;34mINFO: Done. Please check the target to make sure
expected files are where they're supposed to be.
^[[0m^[[1;34mINFO:
The parsyncfp cache dir takes up [439M /root/.parsyncfp-backups-projectsc]
Don't forget to delete it, but wait until you are sure that your job
completed correctly, so you don't need the log files anymore.