stefanprodan / swarmprom

Docker Swarm instrumentation with Prometheus, Grafana, cAdvisor, Node Exporter and Alert Manager

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Grafana Dashboard for Swarm Nodes contains incorrect query.

darkl0rd opened this issue · comments

The "Containers network traffic by Node" returns an error when Prometheus was restarted in the meantime and ends up with a different value for "instance".

The queries used on the dashboard are:

sum(rate(container_network_receive_bytes_total{container_label_com_docker_swarm_node_id=~"$node_id"}[$interval])* on(container_label_com_docker_swarm_node_id) group_left(node_name) node_meta) by (node_name)

- sum(rate(container_network_transmit_bytes_total{container_label_com_docker_swarm_node_id=~"$node_id"}[$interval]) * on(container_label_com_docker_swarm_node_id) group_left(node_name) node_meta) by (node_name)

I have been trying to rewrite the query to get it to work, without any success.

Reproduction case:

  • Run Prometheus
  • Collect data for x minutes.
  • Restart prometheus.

When the restart is within the scope of the timeline you select in Grafana, the graph will error with a
"multiple series error".

sum by(node_name) (irate(container_network_receive_bytes_total[$interval]) * on(container_label_com_docker_swarm_node_id) group_left(node_name) sum without(instance) (node_meta{node_id=~"$node_id"}))