podman_container_mem_usage_bytes metrics disappeared

Question

podman_container_mem_usage_bytes metrics disappeared

felixkrohn opened this issue 2 years ago · comments

Describe the bug
I used to query podman_container_mem_usage_bytes in prometheus, but now noticed it's not exposed anymore. a manual crawl of the /metrics file confirms this.

To Reproduce
unfortunately I can't really say what has changed. First I assumed a permission problem on the podman socket and created a separate one only for the prometheus-podman-exporter container:

/ $ id
uid=65534(nobody) gid=65534(nobody)
/ $ ls -la /run/podman/podman.sock 
srw-------    1 nobody   nobody           0 Dec  5 17:35 /run/podman/podman.sock
/ $

logs (last 7 lines repeat):

ts=2022-12-05T17:35:47.710Z caller=exporter.go:63 level=info msg="Starting podman-prometheus-exporter" version="(version=1.3.0, branch=main, revision=dev.1)"
ts=2022-12-05T17:35:47.711Z caller=handler.go:93 level=info msg="enabled collectors"
ts=2022-12-05T17:35:47.711Z caller=handler.go:104 level=info collector=container
ts=2022-12-05T17:35:47.711Z caller=handler.go:104 level=info collector=image
ts=2022-12-05T17:35:47.711Z caller=handler.go:104 level=info collector=network
ts=2022-12-05T17:35:47.711Z caller=handler.go:104 level=info collector=pod
ts=2022-12-05T17:35:47.711Z caller=handler.go:104 level=info collector=system
ts=2022-12-05T17:35:47.711Z caller=handler.go:104 level=info collector=volume
ts=2022-12-05T17:35:47.711Z caller=exporter.go:74 level=info msg="Listening on" address=127.0.0.1:9882
ts=2022-12-05T17:35:47.712Z caller=tls_config.go:232 level=info msg="Listening on" address=127.0.0.1:9882
ts=2022-12-05T17:35:47.712Z caller=tls_config.go:235 level=info msg="TLS is disabled." http2=false address=127.0.0.1:9882
ts=2022-12-05T17:35:58.307Z caller=handler.go:34 level=debug msg="collect query:" filters="unsupported value type"
ts=2022-12-05T17:35:58.312Z caller=collector.go:135 level=debug msg="collector succeeded" name=network duration_seconds=0.002719641
ts=2022-12-05T17:35:58.323Z caller=collector.go:135 level=debug msg="collector succeeded" name=pod duration_seconds=0.013724029
ts=2022-12-05T17:35:58.349Z caller=collector.go:135 level=debug msg="collector succeeded" name=volume duration_seconds=0.040115157
ts=2022-12-05T17:35:58.363Z caller=collector.go:135 level=debug msg="collector succeeded" name=container duration_seconds=0.053379088
ts=2022-12-05T17:35:58.511Z caller=collector.go:135 level=debug msg="collector succeeded" name=system duration_seconds=0.202251355
ts=2022-12-05T17:35:58.558Z caller=collector.go:135 level=debug msg="collector succeeded" name=image duration_seconds=0.24884938

testing the socket itself seems OK

$ echo -e "GET /containers/json HTTP/1.0\r\n" | podman unshare nc -U ${SOCKET} 
HTTP/1.0 200 OK
Api-Version: 1.41
Content-Type: application/json
Libpod-Api-Version: 4.3.1
Server: Libpod/4.3.1 (linux)
X-Reference-Id: 0xc0003c8000
Date: Mon, 05 Dec 2022 18:13:37 GMT

[{"Id":"1ef8fdf2102ba787e29125f3def643e6a4c5da4b22266daccffd2b24423b2549","Names" [...]

Expected behavior
have podman_container_mem_usage_bytes available in metrics exposed by prometheus-podman-exporter

environment

CentOS 9 Stream
Podman Version 4.3.1
exporter run in a podman container: /bin/podman_exporter --collector.enable-all --collector.store_labels --debug --web.listen-address 127.0.0.1:9882

Additional context
Any help in debugging this is welcome

Navid Yaghoobi · Answer 1 · Wed Dec 07 2022 19:21:09 GMT+0800 (China Standard Time)

Hi @felixkrohn
I have tried on FC37 the and cannot reproduce the issue (building from main branch).
Will try on CentOS 9 stream and let you know.

[navid@devnode prometheus-podman-exporter]$ ./bin/prometheus-podman-exporter --version
prometheus-podman-exporter (version=1.3.0, branch=main, revision=dev.1)

Example output:

# HELP podman_container_mem_usage_bytes Container memory usage.
# TYPE podman_container_mem_usage_bytes gauge
podman_container_mem_usage_bytes{id="1cddd7e911ef"} 3.01056e+06
podman_container_mem_usage_bytes{id="42907dc71261"} 49152
podman_container_mem_usage_bytes{id="5241480811bd"} 0
podman_container_mem_usage_bytes{id="bd7627e4d928"} 45056
podman_container_mem_usage_bytes{id="c81bdeea85df"} 2.740224e+06

Navid Yaghoobi · Answer 2 · Wed Dec 07 2022 19:40:44 GMT+0800 (China Standard Time)

Can u please also attach the output of

curl http://localhost:9882/metrics | grep containers

Felix Krohn · Answer 3 · Wed Dec 07 2022 19:40:59 GMT+0800 (China Standard Time)

In fact I seem to get none of the resource usage metrics under podman_container_*:

$ podman exec -ti prometheus-podman-exporter sh
/ $ ps a
PID   USER     TIME  COMMAND
    1 nobody    2:11 /bin/podman_exporter --collector.enable-all --collector.store_labels --web.telemetry-path /podman/metrics --debug --web.listen-address 127.0.0.1:9882
   18 nobody    0:00 sh
   87 nobody    0:00 ps a
/ $ /bin/podman_exporter --version
prometheus-podman-exporter (version=1.3.0, branch=main, revision=dev.1)
/ $ wget -q http://127.0.0.1:9882/podman/metrics -O- | grep -v "^#" |grep "^podman" | awk -F'{' '{print $1}' | sort | uniq
podman_container_created_seconds
podman_container_exit_code
podman_container_exited_seconds
podman_container_info
podman_container_started_seconds
podman_container_state
podman_image_created_seconds
podman_image_info
podman_image_size
podman_network_info
podman_scrape_collector_duration_seconds
podman_scrape_collector_success
podman_system_api_version
podman_system_buildah_version
podman_system_conmon_version
podman_system_runtime_version
podman_volume_created_seconds
podman_volume_info

github-actions · Answer 4 · Sat Jan 07 2023 08:14:27 GMT+0800 (China Standard Time)

A friendly reminder that this issue had no activity for 30 days.

Felix Krohn · Answer 5 · Fri Jan 13 2023 20:25:19 GMT+0800 (China Standard Time)

The container metrics on that server spontaneously reappeared about one week ago without any manual intervention. I have no clue what lead to this, it must have been on a podman auto-update run? Sorry, I really don't know what broke and what fixed it, I'll close this issue.