hjmangalam / parsyncfp

follow-on to parsync (parallel rsync) with better startup perf

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Better auto-selection of $NETIF

novosirj opened this issue · comments

Was planning on adding this to our own wrapper script, but it seems like it would find a better home within parsyncfp proper. I'm only sure of how to implement a portion of it though -- you probably know better than me what the handling of $TARGET really looks like.

In any case, our command line for parsyncfp looks more or less like this:

/usr/local/bin/parsyncfp -v 0 --nowait --altcache '/root/.parsyncfp-backups' -NP 12 --rsyncopts '-a -e "ssh -x -c aes128-gcm@openssh.com -o Compression=no"' --maxload 96 --chunksize=10T --fromlist=$HOMEDIR/$SNAPSHOT.list.allfiles.clean --trimpath=/$FILESYSTEM/.snapshots/$SNAPSHOT --trustme nas1:/zdata/gss/$SNAPSHOT

In this case, for the above command line, rather than just choosing the default route, you can pick the actual interface that will be used for the route this transfer will take:

ip -o route get $(getent hosts nas1 | awk '{print $1}') | perl -nle 'if ( /dev\s+(\S+)/ ) {print $1}'

...though obviously done somewhat differently as you're already running this from within perl. "ip route get" apparently requires an IP address. I'm not sure if there's a fancier way to get one.

This also works on MacOS X, though a little easier as route supports hostnames:

route get nas1 | grep interface | awk '{print $2}'

Of course, this gets slightly more complicated as you have to figure out what to do when $TARGET doesn't have a host name in it, but I'd guess that in all cases, the host name or IP address would appear before a colon. getent hosts is safe for either a host or an IP address, though there might be a smarter way to get that information within perl.

I could try my hand at writing something/submitting a pull request, just might be a little slow.

Also worth mentioning is that route and ifconfig are both deprecated on Linux, in favor of ip. I'm not sure if you did that out of habit, or for compatibility with more operating systems, but something to think about.

No, that’s not what I meant actually. What I’m referring to is specifically your implementation of automatically selecting $NETIF for monitoring purposes. My local example: your code has it automatically select the interface that has the default route. On our system, and I imagine on other systems or in your case that detects multiple default routes and exits forcing the user to choose, the route to the target may not be the default route. In our case, the default is eno1, but the route that will be used to copy is ens6. Since the system already knows which interface it will use to reach the target, what I’m suggesting is just asking the routing engine (with about the same amount of code) rather than looking for the default route and defaulting to that.

Still not exactly right. I know that I can manually select a different interface with --interface. My point is that the system knows how it will route traffic to the host whose name ultimately will be part of the $TARGET variable, and that can be asked for by running ip route get <IP>. On our system, this is not the default route/will not use the default interface, so a run of parsyncfp without --interface will monitor an idle interface. I can manually set --interface ens6, but since the routing table knows how it will route traffic to nas1 already, why not automatically select the interface over which the transfer will actually occur, rather than asking for the default route and choosing that interface. This will provide a sensible default in all cases, rather than in many cases monitoring the idle interface that has the default route.

I'm not sure about that; I think our case is similar. In both cases, I'd assume, the --interface switch is needed because parsyncfp chooses the wrong interface to monitor.

Our scenario is that we have a machine that is part of our GPFS cluster. Its default route is to the production network on eno2, which goes to the rest of the campus and to the internet. So if we used parsyncfp to transfer data to most sites -- Google, a central server on our campus, etc.. -- the data would be transferred via the default route/over the default route interface and parsyncfp would show bandwidth updates for the correct interface, since your code gets the interface name for the default route and uses it for, effectively, the value of --interface if not specified:

$NETIF = `/sbin/route -n | grep "^0.0.0.0" | awk '{print \$8}'`; chomp $NETIF;

If we copy to host nas1, however, the system will use ens6 for the transfer, because nas1 is on the storage network which is on the same subnet as ens6.

If instead of using the above code, that line read like this (or again, the smarter non-nested-perl example that I'd need to remind myself how to write):

$NETIF = `ip -o route get $(getent hosts $TARGETHOST | awk '{print $1}') | perl -nle 'if ( /dev\s+(\S+)/ ) {print $1}'`

...you would get the actual interface rsync was going to be using, and therefore would be monitoring the actual traffic. Here are a few local examples:

[root@quorum01 bin]# ip -o route get $(getent hosts wasabi.com | awk '{print $1}') | perl -nle 'if ( /dev\s+(\S+)/ ) {print $1}'
eno2
[root@quorum01 bin]# ip -o route get $(getent hosts www.rutgers.edu | awk '{print $1}') | perl -nle 'if ( /dev\s+(\S+)/ ) {print $1}'
eno2
[root@quorum01 bin]# ip -o route get $(getent hosts nas1 | awk '{print $1}') | perl -nle 'if ( /dev\s+(\S+)/ ) {print $1}'
ens6

$TARGETHOST could come from $TARGET, which is defined as:

$TARGET = $ARGV[$#ARGV]; # remote rsync target

...which on our above example invocation would set $TARGET = "nas1:/zdata/gss/$SNAPSHOT", and therefore one could use something like:

$TARGETHOST = (split(/:/, "$TARGET))[0];

...to get the host portion of the rsync target, for example "www.rutgers.edu" for an example case where $TARGET = "www.rutgers.edu:/tmp/wherever". I suppose one would need to trap for cases where $TARGET is a local PATH and therefore contains no ":".

Is it worth it? I personally think so and am probably willing to write a PR.