SwissDataScienceCenter / amalthea

A kubernetes operator for spawning and exposing jupyter servers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Do not cache so many prometheus metrics

olevski opened this issue · comments

Amalthea has a lot of metrics because it creates a new metric for each combination of user and project id.

I did not know this but prometheus caches all these metrics even after they have been scrpated until .remove is called on the metric with the label names that should be removed from the cache.

Without removing this the cache grows and at some point amalthea published 100k-200k metrics at a time.

So we should periodically call .remove on metrics that we do not need anymore. The easiest way to do this is to call this 1 minute after a session is killed. So after a session is removed we call remove for the user id and project id combo of the session. If we call remove immediately then prometheus will not have a chance to scrape the metrics that were generated. That is why we should have a delay. This delay can be configurable with a default of 1 minute.