slackhq / nebula

A scalable overlay networking tool with a focus on performance, simplicity and security

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Lots of prometheus metrics has the wrong type

firecow opened this issue · comments

What version of nebula are you using?

1.7.2

What operating system are you using?

Linux

Describe the Bug

Scraping nebula metrics i have found that a lot of metric entries have incorrect metric types.

curl localhost:9102/metrics
...
# HELP nebula_firewall_incoming_dropped_local_ip firewall.incoming.dropped.local_ip
# TYPE nebula_firewall_incoming_dropped_local_ip gauge
nebula_firewall_incoming_dropped_local_ip 0
# HELP nebula_firewall_incoming_dropped_no_rule firewall.incoming.dropped.no_rule
# TYPE nebula_firewall_incoming_dropped_no_rule gauge
nebula_firewall_incoming_dropped_no_rule 1693
# HELP nebula_firewall_incoming_dropped_remote_ip firewall.incoming.dropped.remote_ip
# TYPE nebula_firewall_incoming_dropped_remote_ip gauge
nebula_firewall_incoming_dropped_remote_ip 0
# HELP nebula_firewall_outgoing_dropped_local_ip firewall.outgoing.dropped.local_ip
# TYPE nebula_firewall_outgoing_dropped_local_ip gauge
nebula_firewall_outgoing_dropped_local_ip 37005
# HELP nebula_firewall_outgoing_dropped_no_rule firewall.outgoing.dropped.no_rule
# TYPE nebula_firewall_outgoing_dropped_no_rule gauge
nebula_firewall_outgoing_dropped_no_rule 0
# HELP nebula_firewall_outgoing_dropped_remote_ip firewall.outgoing.dropped.remote_ip
# TYPE nebula_firewall_outgoing_dropped_remote_ip gauge
nebula_firewall_outgoing_dropped_remote_ip 0
...
# HELP nebula_handshake_manager_initiated handshake_manager.initiated
# TYPE nebula_handshake_manager_initiated gauge
nebula_handshake_manager_initiated 129
# HELP nebula_handshake_manager_timed_out handshake_manager.timed_out
# TYPE nebula_handshake_manager_timed_out gauge
nebula_handshake_manager_timed_out 85
...
# HELP nebula_messages_rx_recv_error messages.rx.recv_error
# TYPE nebula_messages_rx_recv_error gauge
nebula_messages_rx_recv_error 51
# HELP nebula_messages_tx_punchy messages.tx.punchy
# TYPE nebula_messages_tx_punchy gauge
nebula_messages_tx_punchy 1.359801e+06
# HELP nebula_messages_tx_recv_error messages.tx.recv_error
# TYPE nebula_messages_tx_recv_error gauge
nebula_messages_tx_recv_error 39
# HELP nebula_network_packets_duplicate network.packets.duplicate
# TYPE nebula_network_packets_duplicate gauge
nebula_network_packets_duplicate 0
# HELP nebula_network_packets_lost network.packets.lost
# TYPE nebula_network_packets_lost gauge
nebula_network_packets_lost 765710
# HELP nebula_network_packets_out_of_window network.packets.out_of_window
# TYPE nebula_network_packets_out_of_window gauge
nebula_network_packets_out_of_window 0
...

At least these metrics needs to be changed from gauge to counter

The help comments aren't that helpful either 😄 But I can live with that.

Some parts of the code know that certain metrics are counters https://github.com/slackhq/nebula/blob/master/bits.go#L13, but for some reason it's not being propagated to the prometheus output properly.

Logs from affected hosts

No response

Config files from affected hosts

No response

The main problem here is that we use an abstraction for stats so that we can support both graphite and prometheus style metrics.

From there the underlying problem is that prometheus.Counter does not expose a Set function like prometheus.Guage so we can't just cram the correct value into a native prometheus counter.

Can you describe how this is impacting your use case?

Metricbeat prometheus rate_counters=true feature isn't functioning properly.

https://www.elastic.co/guide/en/beats/metricbeat/current/metricbeat-metricset-prometheus-collector.html

Thats the only problem I'm having so far 👍