prometheus-operator / prometheus-operator

Prometheus Operator creates/configures/manages Prometheus clusters atop Kubernetes

Home Page:https://prometheus-operator.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Integrate OpenTelemetry Target Allocator with Prometheus Operator

nicolastakashi opened this issue · comments

Component(s)

Prometheus, PrometheusAgent

What is missing? Please describe.

The sharding feature available on the Prometheus Operator is using the hash mod function of Prometheus relabelling to distribute targets cross prometheus instances.

But a know issues with this hash mod strategy is the uneven targets distribution, since it's not using a consistent hashing we might have one prometheus instance with more targets than others.

The idea behind this issue is to provide a different solution to achieve a more stable and consistent sharding strategy.

Describe alternatives you've considered.

The OpenTelemetry project has a service named TargetAllocator which is used by the OpenTelemetry Operator to distribute targets through the existing collectors instances, ensuring a even distribution of targets using a consistent hashing solution.

For more information please check: https://github.com/open-telemetry/opentelemetry-operator/blob/main/cmd/otel-allocator/README.md

Environment Information.

N/A

Working on this on the sidelines.

Really cool that you're trying this! The Target Allocator has practically no Otel-specific logic in it, so this looks very doable.

Here are some possible sharp edges that I hope to spare you from experiencing:

  1. Authorization in scrape configs is not solved: open-telemetry/opentelemetry-operator#1669. Serialized scrape configs don't include secrets, and even if they did, we can't send them over a plain HTTP. In the mid-term we want to solve this by using client-side TLS and custom serialization to include secrets in the payload.

  2. We have endpointslice support enabled in our config generation, while prometheus-operator didn't last I checked: #3862 https://github.com/open-telemetry/opentelemetry-operator/blob/dab898f6bb45d654fb138eb6c4860e15ee5eb59b/cmd/otel-allocator/watcher/promOperator.go#L86. You can run into trouble if you use the operator managed kubelet Service alongside the target allocator in a large cluster.

  3. There are some config serialization woes in general: open-telemetry/opentelemetry-operator#2793

  4. This isn't a problem per se, but be aware that the target allocator does a part of target relabeling on its own, and only give the client targets which weren't removed by relabeling. We actually want to experiment with moving all target relabeling to the allocator, thus avoiding sending massive label sets to the client.