cloudfoundry / bosh

Cloud Foundry BOSH is an open source tool chain for release engineering, deployment and lifecycle management of large scale distributed services.

Home Page:https://bosh.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`/metrics` and `/api_metrics` endpoint does not show the generic API metrics for the director's endpoints

Malsourie opened this issue · comments

Describe the bug
According to documentation the /metrics and /api_metrics endpoint will expose the generic API metrics for the director's endpoints including number of requests and response time. But when we call those endpoints:

  1. /metrics only exposes the metrics for /metrics and /api_metrics endpoints
  2. /api_metrics only return OK

To Reproduce

  1. Deploy a bosh director
  2. curl some endpoints of bosh, e.g. vms, deployments, etc.
  3. curl http://<bosh_ip>:9092/metrics
  4. curl curl http://<bosh_ip>:9092/api_metrics

Expected behavior
The endpoint should return metrics of bosh endpoints.

Hi,
as show case I implemented a tiny puma and prometheus-client integration which serves the expected webserver access metrics https://github.com/mvach/PumaMetricsExample.

Sadly I don't see the difference to the current director implementation right now.

I'm not sure how the generic API metrics are supposed to work at all because:

  • As defined in the director job the metrics-server is started as a different process. You can check this also on a director VM with netstat -tulpn | grep 9092 & ps -aux | grep <pid-from-previous-command>.
  • bosh-director-metrics-server starts and registers the Prometheus collector for itself
  • the metrics_collector collects only the bosh metrics and no generic API metrics.

You see only metrics for the /metrics endpoint because this is the only endpoint you call on the metrics server. Maybe I miss something here but this is my current understanding.

:-) @beyhan,
I just noticed that right now and wanted to update the issue.

I was able to get api metrics from /api_metrics. I did have to enable the metrics and I think I got the OK response when the metrics were NOT enabled, so that might have been part of the problem.

The /api_metrics endpoint does map to director web process, so in theory it would have access to this data. However I'm not sure how accurate the data is. The transition from thin to puma might have introduced separate datasets for each of the forked processes puma creates. So it does return data, but I didn't have a chance yet to verify that the data is actually correct.

Good catch @jpalermo. This is the commit which introduced the change. It looks like initially the /api_metrics were called /director_metrics but yes they redirect to the director process which I missed.

The transition from thin to puma might have introduced separate datasets for each of the forked processes puma creates. So it does return data, but I didn't have a chance yet to verify that the data is actually correct.

This is a good question. It looks to me that they should be accurate because the director internal metrics are gathered from the DB data, but you "never know" :-)