`/metrics` and `/api_metrics` endpoint does not show the generic API metrics for the director's endpoints

Question

`/metrics` and `/api_metrics` endpoint does not show the generic API metrics for the director's endpoints

Malsourie opened this issue a year ago · comments

Describe the bug
According to documentation the /metrics and /api_metrics endpoint will expose the generic API metrics for the director's endpoints including number of requests and response time. But when we call those endpoints:

/metrics only exposes the metrics for /metrics and /api_metrics endpoints
/api_metrics only return OK

To Reproduce

Deploy a bosh director
curl some endpoints of bosh, e.g. vms, deployments, etc.
curl http://<bosh_ip>:9092/metrics
curl curl http://<bosh_ip>:9092/api_metrics

Expected behavior
The endpoint should return metrics of bosh endpoints.

Matthias Vach · Answer 1 · Thu Sep 07 2023 00:51:06 GMT+0800 (China Standard Time)

Hi,
as show case I implemented a tiny puma and prometheus-client integration which serves the expected webserver access metrics https://github.com/mvach/PumaMetricsExample.

Sadly I don't see the difference to the current director implementation right now.

Beyhan Veli · Answer 2 · Thu Sep 07 2023 18:09:22 GMT+0800 (China Standard Time)

I'm not sure how the generic API metrics are supposed to work at all because:

As defined in the director job the metrics-server is started as a different process. You can check this also on a director VM with netstat -tulpn | grep 9092 & ps -aux | grep <pid-from-previous-command>.
bosh-director-metrics-server starts and registers the Prometheus collector for itself
the metrics_collector collects only the bosh metrics and no generic API metrics.

You see only metrics for the /metrics endpoint because this is the only endpoint you call on the metrics server. Maybe I miss something here but this is my current understanding.

Matthias Vach · Answer 3 · Thu Sep 07 2023 19:23:57 GMT+0800 (China Standard Time)

:-) @beyhan,
I just noticed that right now and wanted to update the issue.

Joseph Palermo · Answer 4 · Fri Sep 22 2023 01:00:08 GMT+0800 (China Standard Time)

I was able to get api metrics from /api_metrics. I did have to enable the metrics and I think I got the OK response when the metrics were NOT enabled, so that might have been part of the problem.

The /api_metrics endpoint does map to director web process, so in theory it would have access to this data. However I'm not sure how accurate the data is. The transition from thin to puma might have introduced separate datasets for each of the forked processes puma creates. So it does return data, but I didn't have a chance yet to verify that the data is actually correct.

Beyhan Veli · Answer 5 · Fri Sep 22 2023 20:14:27 GMT+0800 (China Standard Time)

Good catch @jpalermo. This is the commit which introduced the change. It looks like initially the /api_metrics were called /director_metrics but yes they redirect to the director process which I missed.

The transition from thin to puma might have introduced separate datasets for each of the forked processes puma creates. So it does return data, but I didn't have a chance yet to verify that the data is actually correct.

This is a good question. It looks to me that they should be accurate because the director internal metrics are gathered from the DB data, but you "never know" :-)