prometheus / jmx_exporter

A process for exposing JMX Beans via HTTP for Prometheus consumption

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support multi-target in http server mode

KumKeeHyun opened this issue · comments

Currently, the HTTP server only allows collecting metrics from a single target specified by the hostPort or jmxUrl configuration options.
However, with Prometheus supporting multi-target and the jmx_exporter reaching version 1.0, the JmxCollector implementing the MultiCollector interface seems capable of supporting multiple targets.

public class JmxCollector implements MultiCollector {

I propose introducing a new endpoint /metrics?target={hostPort} that allows dynamically specifying the jmtUrl for the JmxScraper to collect.

JmxScraper scraper =
new JmxScraper(
config.jmxUrl,
config.username,
config.password,
config.ssl,
config.includeObjectNames,
config.excludeObjectNames,
config.objectNameAttributeFilter,
receiver,
jmxMBeanPropertyCache);

If a request arrives at /metrics without a target parameter, the collector would collect metrics from the configured hostPort or jmxUrl or empty string(only agent mode).
This approach would not break existing behavior in HTTP server mode, as specifying hostPort or jmxUrl remains mandatory. In agent mode, an empty string would be passed, aligning with the original design.

A limitation is the inability to set authentication and SSL for each target individually. However, similar to the redis exporter, this could be restricted to using the same authentication for all targets.

While I understand the JMX exporter was initially designed with agent mode (local JVM) in mind, I believe leveraging the HTTP server's advantages would be beneficial. Although remotely collecting JMX metrics might incur network overhead, careful configuration of includeObjectName/excludeObjectName can mitigate this concern.

I have already tested this functionality by modifying the source code and achieved satisfactory results. If you are open to supporting this feature, I would be happy to submit a pull request for review.

Thanks for your time.

@KumKeeHyun The standalone exporter has configuration values (rules, hostPort/jmxUrl, potentially JMX authentication/SSL, etc.) that are specific to the target JVM/MBeans that are being scraped.

How are you handling configuration in the functionality that you have tested?

@dhoard

Apologies for the lack of detailed explanations regarding the settings.

The multi-target feature was suggested under the assumption that several servers using the same Rules, Authentication, and SSL are specified as targets.

The tests were conducted as follows:

  • Running a standalone exporter with a rules configuration file for Kafka
  • Kafka cluster of 10 brokers
  • Collecting metrics from multiple brokers through /metrics?target=broker-01:9999, /metrics?target=broker-02:9999, ..., /metrics?target=broker-10:9999 endpoints
  • When creating a JmxScraper in JmxCollector, the jmxUrl was set based on the address received through the target parameter.
    • rules, authentication, and ssl use the standalone exporter's configuration
    • JmxMBeanPropertyCache and MatchedRulesCache were individually created and managed as they may be affected by the change in targets.

hostPort for standalone exporter's config was randomly chosen from one of the brokers.

@KumKeeHyun I feel this usage scenario has already been solved by using a reverse proxy (i.e. Nginx or other) as a router to the correct Kafka server/exporter agent.

@dhoard

I understood the use case of using the reverse proxy to be as follows.

flowchart LR
	prometheus -- "/metrics?target=node-01:9999" --> nginx
	nginx -- "/metrics" --> jmx-exporter-01
	nginx --> jmx-exporter-02
	nginx --> jmx-exporter-03
	subgraph kafka-cluster
	subgraph node-01
	jmx-exporter-01 --> kafka-01
	end
	subgraph node-02
	jmx-exporter-02 --> kafka-02
	end
	subgraph node-03
	jmx-exporter-03 --> kafka-03
	end
	end

There is no functional difference between this use case and multi-target. However, I think multi-target has a significant operational benefits. If jmx-exporter supports multi-target, the overall configuration will be as follows.

flowchart LR
	prometheus -- "/metrics?target=node-01:9999" --> jmx-exporter
	jmx-exporter -- "JmxScrape" --> kafka-01
	jmx-exporter --> kafka-02
	jmx-exporter --> kafka-03
	subgraph kafka-cluster
	subgraph node-01
	kafka-01
	end
	subgraph node-02
	kafka-02
	end
	subgraph node-03
	kafka-03
	end
	end

This approach allows for independent deployment of jmx-exporter instances. This means that when there is a deployment task such as upgrading the jmx-exporter version or changing the rules configuration, we don't need to work on all nodes where jmx-exporter is installed, but only on the independently configured jmx-exporter.

@KumKeeHyun I agree it changes the update domain (standalone exporter vs Java agent.)


Standalone exporter concerns: More infrastructure but no application restarts

To deploy the standalone exporter properly you need three instances plus a high availability load balancer for high availability/fault tolerance.

Why three standalone exporter instances...

Example:

Standalone exporter instances (1, 2, 3)

You upgrade instance 1

While instance 1 is being upgraded, instance 2 crashes for some reason (bug, infrastructure, etc.)

This leaves you with instance 3 to service requests.

If you only have 2 exporter instances, you have an outage.

If you have a single instance, you have an outage during the upgrade.

Because you need multiple instances of the standalone exporter for high availability/fault tolerance, you need a high availability load balancer.

Some people try to use DNS in place of a load balancer, but DNS caching can cause failures.

Based on my experience working with enterprises, DNS changes typically require a change ticket that is implemented by another team.


Java agent exporter: Less infrastructure but application restarts

If you are using the Java agent exporter, an update of the jar and application restart would be required.

Not ideal from a Kafka perspective since it will cause producer/consumer errors as well leader election. Properly implemented Kafka applications should implement retries which will mitigate the restart impact.

YAML configuration updates can be handled via automation so no availability impact.


Security concerns:

Allowing Prometheus (or other collecting application) to provide a dynamic value to identify the scrape target is typically considered insecure.

The implementation would need have extra configuration to map an id to jmxUrl/hostPort.


I'm not opposed to the functionality, but we have to make sure it doesn't break existing users (risk management.)

Thank you for explaining in detail. I have understood the concerns for each domain.
I will close this issue now. Thanks :)