bmhatfield / riemann-sumd

Agent for scheduling event generating processes and sending the results to Riemann

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parse Nagios's performance data

goblin opened this issue · comments

Currently Nagios's performance data (http://nagiosplug.sourceforge.net/developer-guidelines.html#PLUGOUTPUT) is being simply treated as a string and appended to the Riemann's "description" field.

It would be nice if this data was parsed and sent as separate metrics.

Have you got any thoughts on how to best do it? I think we'd have to send multiple events to riemann, one for each piece of performance data. This might complicate things a little.

I'd love to parse this data! Do you happen to have a link to a plugin that actually outputs this data? In all my years of running various Nagios installs, I've never noticed one output this data. Maybe I just wasn't looking close enough :-)

If I can see what the output looks like, I can improve the NagiosTask to properly parse it.

Sure, for instance the http check:

% ./check_http -H google.com
HTTP OK: HTTP/1.1 301 Moved Permanently - 559 bytes in 0.031 second response time |time=0.031230s;;;0.000000 size=559B;;;0

Or the SSH one:

% ./check_ssh localhost     
SSH OK - OpenSSH_6.0p1 Debian-4 (protocol 2.0) | time=0.014560s;;;0.000000;10.000000

(they're from nagios-plugins-basic debian sid package version 1.4.16-1)

(edited, it's basic, not standard)

Thanks!

Okay, so I am doing some legwork to update python-bernhard to support Riemann 2+'s attributes field, which I think is a good way to send over an n-sized set of metric data for a given event.

I'm not sure that it's a good idea to send many events to get multiple metrics for the same service, because when I think about performing actions on their state, I don't want 5 alerts/pages for one service failure.

I still would like to pick one of the labels (when the amount of performance data is len(n) > 1) as the canonical 'metric', but I'm not really sure I can come up with a good 'rule' to choose. I might just go with index 0 of the parsed return string.

Thoughts?

Here's the upstream PR to update bernhard to support 'attributes': https://github.com/banjiewen/bernhard/pull/6

Also, it appears that field name collisions will be ignored: for example, 'time' is an event field as well as a performance data field. Which means we're probably going to need to prefix it :-/

Added a first pass, but it needs some tuneup and cleanup: df8c956

Whoah, that was quick!

These new attributes look like a great use case for this indeed, I wasn't aware they existed :-)

Had a quick test and it looks pretty good, one minor issue is that my response time of 0.035 seconds or so gets rounded and the attribute ends up as :task_time "0.0".

But wow, many, many thanks for implementing this so quickly :-)

I've fixed the rounding problem with an extra dot in the regex: #3
(pretty minor of course;-)

Okay, I tidied up the parsing code a little bit more. There's more to be done here, I think, around making it really bulletproof (ie; performance data returned but it is nonsensical/invalid), but this should be good enough for common use cases.

90c68b6