pinterest / elixometer

A light Elixir wrapper around exometer.

Home Page:https://hexdocs.pm/elixometer

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question: why does elixometer need a GenServer in front of exometer?

aerosol opened this issue · comments

commented

What is the rationale behind a process gateway in front of exometer?

I'm scratching my head and trying to figure out why is everything proxied through a single gen server instance.

I can see an ETS table that keeps track of subscriptions/metrics defined and a periodic tick that resets counters. It somewhat justifies the need of having a dedicated process, however am I wrong in thinking that exometer_core does that on its own anyway? 🤔

I'm assuming that you're talking about the pobox stuff; Elixometer also has a genserver that serves to serialize config changes. None of the metric updates go through that process.

The reason Elixometer has pobox is to shed load if metrics are updated en masse. We experienced a pretty significant performance bottleneck with exometer when we threw many metrics updates at it at once. Adding pobox completely fixes this by dropping updates after the process mailbox reaches 1000, which is a large amount of backlog. If you're under the limit, you won't be affected, and if you're over this limit, I think discarding metrics is better than slowing your app down.

commented

Hey @scohen, thanks for the reply! ❤️

The reason Elixometer has pobox is to shed load if metrics are updated en masse. We experienced a pretty significant performance bottleneck with exometer when we threw many metrics updates at it at once.

Interesting! Have you found out which of exometer's processes was the culprit? Was it the reporter itself, that is only capable of receiving and handling one metric/datapoint message at a time? Or was it the underlying gen_tcp setup?

after the process mailbox reaches 1000

Do you recall what's the flush interval I could use to replicate that?


The main question origin is: I'm very new to exometer and I don't really grok it fully yet. I'm looking for a drop-in replacement for my folsom/folsomite setup, where it simply blows up with around 3k total unique metrics (due to ETS leak). I'm dumping the aggregated metrics to hostedgraphite every 40 seconds. I know I can fix this by just bumping ERL_ETS_MAX_TABLES, but that won't scale in the long run, and I feel there should be a way to do better, without dropping metrics overflow on the floor. So I guess what I'm looking for is a test case that'll allow me to benchmark this properly, before I make the production switch (staging is a lie).

@aerosol, The problem wasn't with the flush interval, but because exometer uses :gen_server.call somewhere in its stack. We were flooding the exometer process with tens of thousands of messages at once, and the calls were synchronous, and it would take 30ms or so for the process to clear its queue.

I don't think having tens of thousands of messages will be a problem, but I'm not extremely familiar with exometer's internals. You might want to ask Ulf about this. The problem we fixed was due to many, many messages being created in a very short time.

commented

@scohen awesome.

We were flooding the exometer process with tens of thousands of messages at once, and the calls were synchronous

How many unique metrics (as in "metric name") did you have?

commented

BTW, do you think it'd be worthwhile to extend elixometer with a metric for the number of messages dropped?

How many unique metrics (as in "metric name") did you have?

Per app? Maybe 10 - 50.

BTW, do you think it'd be worthwhile to extend elixometer with a metric for the number of messages dropped?

In our world, dropping doesn't happen often, and I didn't think it was that important in the grand scheme of things. Exometer can process tens of thousands of messages per second, and we've only ever had a single application cross this threshold to where exometer became a bottleneck.
The pobox library handles dropping of messages, so we'd have to instrument that somehow.

commented

Thank you, this helps a lot! Last final question and we can close this: which reporter plugin did you use?

We use exometer_report_opentsdb exclusively.

@aerosol I'm going to close this, feel free to ask more questions if any come up.