cep21 / circuit

An efficient and feature complete Hystrix like Go implementation of the circuit breaker pattern.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add prometheus metrics collector

on99 opened this issue · comments

commented

Is it possible to expose the metrics to prometheus?

Yes it is! I would start with copying the implementation at https://github.com/cep21/circuit/blob/master/metrics/statsdmetrics/statsd.go but with prometheus as a destination.

You can start it yourself and I can help review and cleanup the code. I can do it myself, but it will take some time before I start and I don't have a prometheus cluster to test it against.

commented

@cep21 Thank you, maybe I can wait because I am not familiar with Prometheus metrics.

@cep21 any progress on this? let me know if I can help.

Hi,

No progress has been made. I think it would work great as a separate repository that has implementations for func(string) circuit.Config

I would follow the pattern of https://github.com/cep21/circuit/blob/v3.0.1/v3/metrics/statsdmetrics/statsd.go#L182

Have a CommandFactory that is configured correctly to have a function named func (c *CommandFactory) CommandProperties(circuitName string) circuit.Config. It would be used something like this

	f := prometheusmetrics.CommandFactory{
                // .... add some configuration here
	}

	// Wire the prometheus factory into the circuit manager
	h := circuit.Manager{
		DefaultCircuitProperties: []circuit.CommandPropertiesConstructor{f.CommandProperties},
	}

Help wanted. If you can implement this, I would be glad to link to your repository inside the README.md

Update.
Recently I adopt circuit in my project, and will test in following weeks, if everything works fine, I will add prometheus metric collector support.

Hi,

I think it was a mistake to include the statsd exporter inside the circuit library like I did with https://github.com/cep21/circuit/tree/master/metrics/statsdmetrics, but I'm stuck with it for backwards compatibility. I think it's better to put features in their own repository and link them from circuit, like I did for https://github.com/cep21/aimdcloser. That keeps the circuit library smaller. If you make a prometheus exporter, can you host it on your personal github? I can review the code and I'll link to it from the README.md of cep21/circuit.

@cep21 That sounds reasonable to me, circuit should keep the core small.
Thanks for your effort to this useful lib.

Hi, I'll get to reviewing your repo this week.

Thank for your time.

FYI, I have deployed it in production for several days, everything works as expected.

The code looks great. Here are some comments:

Can you add a LICENSE file, similar to the one here directly to the repository?
Can you run it thru a static linter? Maybe golint? I have an example with this repository on using travis. You could also use circleCI.
Can you add godoc to CommandFactory and GetFactory?

Otherwise it looks good. I'll update this repository with a link to the exporter.

@jiacai2050
Actually for circuitMetricsCollector.Opened and circuitMetricsCollector.Closed, can you add a gauge style metric that is either 0 or 1. It would be the same metric for each call. That way, your dashboard knows not just how many times a circuit opened or closed, but also if it is currently open or closed.

LICENSE and lint sounds reasonable for me, I will fix those in next release.

Current I use following promql to get the status of a circuit

(rate(circuit_opened_total{env="$env"}[1m])) - on (name, instance) (rate(circuit_closed_total{env="$env"}[1m])) 

if > 0, it means the circuit is open, otherwise closed.
Apparently, the disadvantage of using only one metric here is a call of binary operator, but use less resources in user's application.

I'm fine with adding a gauge metric, which inc when Opened is called and dec when Closed is called, just want to make the tradeoff clear.

Will that query work if the circuit hasn't opened or closed in the past hour, but is currently closed?

image

image

Yes, I think that query works in that situation.

How can a query work if it never got any metrics? If the circuit is idle, neither open nor close will be incremented

Well, in this case, no metric will have any value, including a gauge.

What happens if:

The circuit gets normal traffic, open, close, so on.
Then, the circuit does not open or close for 3 hours.

If you use a gauge, you will still know the circuit is closed (or open)

If you use the rate promql above, without any metrics for open or closed in the past hour, you won't know if the circuit is currently open or closed. You will just only know that 0 metrics for opened and 0 metrics for closed, but you won't know if the circuit is currently open or closed.

It seems adding a gauge make things more clear.

I push a commit fix issues discussed here, including add a gauge.
Please take another look, I will attach tag v0.0.4 if everything is ok.

Looks great! Thanks so much for this.