medic / cht-watchdog

Configuration for deploying a monitoring/alerting stack for CHT

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

data from cht_connected_users does not give useful insights for lower granularity time slices

derickl opened this issue · comments

We are scraping every 5 minutes for users who have connected over the last 30 days. It might be useful to drop this to daily or introduce a new widget that supports lower granularity time slices

Thanks for the ticket @derickl !

While we are querying for "cht_connected_users over last 30 days" every five minutes, which clearly won't change that frequently, we're also querying every other metric on the /api/v2/monitoring endpoint in that same request. Many of these metrics change every minute and are critical to measure frequently, with 5 minutes being a sane value.

As well, the monitoring endpoint is optimized to be low to no impact on production instances. Medic's own production CHT Watchdog instance has been monitoring over 50 CHT instances for 2+ months with no demonstrable impact.

Imma close this ticket as we there's no near term plan to change the way the monitoring API works, but feel free to comment or ask questions if I missed anything!