Request time metrics and health check
andrus opened this issue · comments
Been playing with the existing metrics with a real app under load. They all look great (thread pool utilization, request queue). There's another metric that is extremely useful to estimate the overall health of the app - response time. We can build a histogram of response times over time, and build a health check with thresholds for the 99th percentile.
A challenge is that in some apps there's a big variation between "normal" response times across the different kinds of services (and service parameters). Though hopefully averages should be pretty stable (?) and this metric can serve as a proxy for the overall app performance as a function of external load.
Also see #93 for more advanced request time petering scenario.