munin-monitoring / contrib

Contributed stuff for munin (plugins, tools, etc...)

Home Page:http://munin-monitoring.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

multi_tcp_ping may connect to the wrong host

nitram2342 opened this issue · comments

I debugged an issue, where our munin failed to detect a service problem and resolved another IP address for a DNS host name. The "problem" is the DNS resolution strategy in Net::Ping respectively in general. The issue here is, that everything works as expected and this leads to multi_tcp_ping connecting to another host. And if the service is TCP-pingable over there, the plugin fails to detect the issue.

Net::Ping uses the function _resolve() to resolve DNS entries, which is based on Socket::getaddrinfo (at least in my case). When a host name cannot be resolved due to an DNS issue, the resolver treats a name as a host name within the default domain and sends a corresponding query. If there is a wildcard DNS in place, the host name is resolved, but the address may be different from the "expected" one.

A solution seems to be to suffix host names with a "." in the plugin config. The plugin "multi_tcp_ping" then runs in a detectable error:

$ munin-run multi_tcp_ping
Thread 1 terminated abnormally: getaddrinfo(maskedhere.li.,,AF_INET) failed - Temporary failure in name resolution at /etc/munin/plugins/multi_tcp_ping line 111 thread 1

However, this stops the script and does not assign the unreachable-value. Munin will show a "-nan". It seems that this value could also trigger an alarm. Nevertheless, I wrapped the ping() call into an eval block, which allows catching the croak from Net::Ping's _resolve() function:

sub ping_host {
[...]
    $ret = undef;
    eval {
        ($ret, $time, $ip) = $p->ping($host->[0]);
    };
    if($@){
        warn "func blew up: $@";
    }
    $time = $defaults{unreachable} if !$ret;
    print "${addr}.value $time\n";
}

To summarize my fix: 1. Use a "." at the end of host names in the munin-node plugin config. 2. Catch the croak from Net::Ping. 3. Adjust the names for warnings/criticals in the munin config, because the host name suffix "." will be rewritten as a "_".

I could adjust the plugin code in my fork, but before I do that, I just want to know what do you think?

PS: This kind of problem seems to affect other plugins that resolve host names as well, for example http_loadtime and ssl-certificate-expiry.

Thank you for your detailed explanation!

I think, munin is doing the right thing here, since it uses the configured host name and does not try to impose its own interpretation of the user's expectations (e.g. private or public DNS resolution). In your case, you could just append the trailing dot in your plugin configuration, if you want to insist on skipping the local DNS resolution settings.

I guess, your use-case could be covered without code modifications, if the plugin would just wrap the ($ret, $time, $ip) = $p->ping($host->[0]); call into an eval, right? This would result in the unreachable value being used.

Thus the following diff would be sufficient from my point of view:

--- a/plugins/ping/multi_tcp_ping
+++ b/plugins/ping/multi_tcp_ping
@@ -108,7 +108,9 @@ sub ping_host {
     $p->service_check(1);
     $p->{port_num} = $host->[1] || $defaults{port};
 
-    ($ret, $time, $ip) = $p->ping($host->[0]);
+    eval {
+        ($ret, $time, $ip) = $p->ping($host->[0]);
+    };
 
     $time = $defaults{unreachable} if !$ret;
     print "${addr}.value $time\n";

I will add the change, if you confirm, that it is sufficient.

Regarding the adjustment of names for warnings/criticals: could you elaborate, please?

Sorry for the delay. Yes, this should be sufficient. Regarding the 'warn' - it was debugging code. It can be removed.

Thank you for your report and your confirmation. I just pushed this change.