hep-gc / shoal

A squid cache publishing and advertising tool designed to work in fast changing environments

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Shoal configures CVMFS with a malformed proxy list

rptaylor opened this issue · comments

The shoal command to configure the proxy list should be executed like this:

cvmfs_talk proxy set "http://kraken01.westgrid.ca:3128;http://atlascaq3.triumf.ca:3128;http://atlascaq4.triumf.ca:3128"

When I test this on a current CernVM instance I confirm that 'sudo cvmfs_talk proxy info' shows that the correct configuration has been applied.

On the same VM, running 'shoal-client' produces these log messages which appear okay:

Oct 24 21:26:22 host-10-39-17-214 shoal-client: Setting "http://kraken01.westgrid.ca:3128;http://atlascaq3.triumf.ca:3128;http://atlascaq4.triumf.ca:3128;http://192.168.0.32:3128;http://206.167.181.94:3128;http://kraken01.westgrid.ca:3128;http://cernvm-webfs.atlas-canada.ca:3128;DIRECT" as proxy
Oct 24 21:29:51 host-10-39-17-214 shoal-client: CVMFS proxies set to "http://kraken01.westgrid.ca:3128;http://atlascaq3.triumf.ca:3128;http://atlascaq4.triumf.ca:3128;http://192.168.0.32:3128;http://206.167.181.94:3128;http://kraken01.westgrid.ca:3128;http://cernvm-webfs.atlas-canada.ca:3128;DIRECT"

However, querying CVMFS shows that shoal-client has misconfigured the proxy URLs. Note the first URL is invalid due to "http://" appearing twice, and a rogue quotation mark appearing.

atlas.cern.ch:
Load-balance groups:
[0] http://"http://kraken01.westgrid.ca:3128 (:unresolved:, -6m)
[1] http://142.90.110.68:3128 (atlascaq3.triumf.ca, +2h)
[2] http://142.90.90.61:3128 (atlascaq4.triumf.ca, +2h)
[3] http://192.168.0.32:3128 (192.168.0.32, +23h)
[4] http://206.167.181.94:3128 (206.167.181.94, +23h)
[5] http://206.12.48.249:3128 (kraken01.westgrid.ca, +21m)
[6] http://142.90.110.68:3128 (atlascaq3.triumf.ca, +2h)
[7] http://DIRECT" (:unresolved:, -6m)
Active proxy: [1] http://142.90.110.68:3128

This is with the recent shoal-client-0.6.4-1.el6.noarch

Here are the client errors that result. Note that the quotation mark at the end of DIRECT is also problematic. This does not happen when executing cvmfs_talk proxy set on the command line with quotation marks, so maybe the problem is related to python popen?

Oct 24 16:54:47 host-10-39-17-214 cvmfs2: (atlas.cern.ch) failed to resolve IP addresses for DIRECT" (4 - unknown host name)
Oct 24 16:54:50 host-10-39-17-214 cvmfs2: (atlas-condb.cern.ch) failed to resolve IP addresses for "http (4 - unknown host name)
Oct 24 16:54:50 host-10-39-17-214 cvmfs2: (atlas-condb.cern.ch) failed to resolve IP addresses for DIRECT" (4 - unknown host name)
Oct 24 16:54:53 host-10-39-17-214 cvmfs2: (atlas-nightlies.cern.ch) failed to resolve IP addresses for "http (4 - unknown host name)
Oct 24 16:54:53 host-10-39-17-214 cvmfs2: (atlas-nightlies.cern.ch) failed to resolve IP addresses for DIRECT" (4 - unknown host name)
Oct 24 16:54:57 host-10-39-17-214 cvmfs2: (grid.cern.ch) failed to resolve IP addresses for "http (4 - unknown host name)
Oct 24 16:54:57 host-10-39-17-214 cvmfs2: (grid.cern.ch) failed to resolve IP addresses for DIRECT" (4 - unknown host name)
Oct 24 16:55:00 host-10-39-17-214 cvmfs2: (sft.cern.ch) failed to resolve IP addresses for "http (4 - unknown host name)
Oct 24 16:55:00 host-10-39-17-214 cvmfs2: (sft.cern.ch) failed to resolve IP addresses for DIRECT" (4 - unknown host name)
Oct 24 16:55:02 host-10-39-17-214 cvmfs2: (atlas.cern.ch) switching proxy from http://"http://kraken01.westgrid.ca:3128 to http://142.90.110.68:3128
Oct 24 16:56:25 host-10-39-17-214 cvmfs2: (grid.cern.ch) failed to resolve IP addresses for "http (4 - unknown host name)
Oct 24 16:56:25 host-10-39-17-214 cvmfs2: (grid.cern.ch) switching proxy from http://"http://kraken01.westgrid.ca:3128 to http://142.90.110.68:3128

This affects the regular CVMFS client repositories, but not the CernVM OS client in the microkernel image (sudo cvmfs_talk -p /mnt/.rw/cache/cernvm-prod.cern.ch/cvmfs_io.cernvm-prod.cern.ch proxy info).

This code hasn't been changed since you implemented it but I suspect the problem is when the formatted string is created:
cvmfs_http_proxy = "\"" + env_proxy + default_http_proxy + "\""
then this string is used in the cvmfs talk call:
p = Popen(["cvmfs_talk", "proxy", "set", cvmfs_http_proxy], stdout=PIPE, stderr=PIPE)

It looks like quotes were added to the string to make it match the format of the others but isn't necessary.
My guess is that when it looks at the first entry the string starts with " instead of http:// so it prepends it.

I'll cut a new version with the quotes removed and send it your way.

I've never written any shoal code but it may well be that there wasn't enough testing and the bug has been present for awhile.

It is really weird that the string printed in the log is correct.

Yes you have :P

But regardless I think the string that is being printed in the log is in fact the same one used. As you can see above your log statements have the string wrapped in quotes which is something a normal log wouldn't do. Since the quotes are actually part of the proxy string I'm fairly certain that is the origin of our problem.

Ha, fair enough, that was a long time ago. But I believe I tested those changes; I recall this related issue: https://sft.its.cern.ch/jira/browse/CVM-987 I think we would have noticed this bug if it was present then. Also, the problem started happening on the 18th when the client was updated with the new Frontier functionality. Anyway it doesn't matter where the bug came from, just need to find it.
Unfortunately it seems that removing the quotes didn't solve the issue. It seems like the string http:// is being prepended to the actual cvmfs_http_proxy variable.

Fixed as of commit 6f964a5.