munin-monitoring / contrib

Contributed stuff for munin (plugins, tools, etc...)

Home Page:http://munin-monitoring.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

docker_cpu crashes if any container is in restarting status

ogmueller opened this issue · comments

The plugin is unable to handle containers, which are in restarting status. When using e.g. munin-run docker_cpu it crashes with the following error:

File "/etc/munin/plugins/docker_cpu", line 552, in <module>
   main()
 File "/etc/munin/plugins/docker_cpu", line 540, in main
   globals()[wildcard](client, mode)
 File "/etc/munin/plugins/docker_cpu", line 450, in cpu
   print_containers_cpu(client)
 File "/etc/munin/plugins/docker_cpu", line 297, in print_containers_cpu
   system_delta = (float(stats["cpu_stats"]["system_cpu_usage"])
KeyError: 'system_cpu_usage'
Warning: the execution of 'munin-run' via 'systemd-run' returned an error. This may either be caused by a problem with the plugin to be executed or a failure of the 'systemd-run' wrapper. Details of the latter can be found via 'journalctl'.
ERROR: munin pluging docker_cpu failed to execute

I would suggest to add something like running_containers:

   148     @cached_property
   149     @sorted_by_creation_date
   150     def running_containers(self):
   151         return [c for c in self.client.containers.list()
   152                 if (not self.exclude or not self.exclude.search(c.name))
   153                 and c.status == 'running']
   154
   155     @cached_property
   156     @sorted_by_creation_date
   157     def all_containers(self):
   158         return [c for c in self.client.containers.list(all=True)
   159                 if not self.exclude
   160                 or not self.exclude.search(c.name)]

The definition parallel_container_stats should than use client.running_containers instead of client.all_containers:

   283 def parallel_container_stats(client):
   284     proc_list = []
   285     stats = {}
   286     for container in client.running_containers:

There are 3 definitions which call parallel_container_stats

  • print_containers_cpu
  • print_containers_memory
  • print_containers_network
    All of them will require a running container and would crash on a restarting container, I assume.

This would be a full patch to current version

149a150,156
>     def running_containers(self):
>         return [c for c in self.client.containers.list()
>                 if (not self.exclude or not self.exclude.search(c.name))
>                 and c.status == 'running']
>
>     @cached_property
>     @sorted_by_creation_date
281c288
<     for container in client.containers:
---
>     for container in client.running_containers:
444c451
<         for container in client.all_containers:
---
>         for container in client.running_containers:
461c468
<         for container in client.all_containers:
---
>         for container in client.running_containers:
487c494
<         for container in client.all_containers:
---
>         for container in client.running_containers:

Thank you for your detailed suggestion!