pengutronix / monitoring-check-systemd-service

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Monitoring several/all systemd services

vdanjean opened this issue · comments

Hi,

I was looking for a icinga2 plugin that would be able to monitor all systemd service by default. My initial goal is to install such a check on all my machine and be notified if some services are in failed state.
I cloned your repo and started to hack it a few days ago. It is the first time I write some python code, so my code could probably be improved a lot. nevertheless, I've now a code that fit my needs. However, there are several points that can be improved:

  • documentation (user and in code). There is nothing updated for now

  • multiple unit selection. There is a --filter REGEXP option. We can imagine to accept several --filter options, adding a -X REGEXP to exclude units, ...

  • options to select the ok/warning/critical state depending on the systemd states. Currently, the mapping is different if the code is used with a unit (as before) or not (look for all units, but accept inactive/dead state for example) but this is hardcoded in the check

  • for transient states (reloading, ...), add options to choose between ok/warn/critical depending on the time in this state

  • add options to select ok/warn/critical depending on the number of unit in state running/inactive/masked/...

  • I'm not satisfied at all with the way I report performances. Should be totally rewritten

  • ...

    Are you interested in this kind of development or would you prefer to keep your plugin simple and dedicated to your initial use case? If you are interested, I would need to know what you want I do for you to merge my code (probably at least update the user documentation). If not, I will rename my fork to continue its development under another name (probably check_systemd-services (ie with a S) at the end)

    Regards,
    Vincent

I am interested in your changes. Please try to write really small patches to let me understand your work. Actually i've seen your changes in the morning when researching for extending the plugin for services in containers and for getting performance values for services and timers.

The linear history of an issue makes discussions on multiple topics very hard so please open one issue for every wanted improvement. Thanks in advance.

Thanks for your interest. I will try to redo my work but it wont be quick as I currently have very few free time.
I will create PR for each improvement.

Regards,
Vincent

lots of code is merged. it works for me :-)