grafana / synthetic-monitoring-agent

Synthetic Monitoring Agent

Home Page:https://grafana.com/docs/grafana-cloud/how-do-i/synthetic-monitoring/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A timeout in the middle of an HTTP request results in strange metrics

mem opened this issue · comments

If the check times out during an HTTP request (e.g. after name resolution is done but before transfer is complete) the context will be canceled and some code paths will result in negative durations.

E.g. if start is now, and transfer complete is still 0, "transfer complete" - "start" results in a negative duration.

The issue is possibly in BBE, but we are making it visible.

This is possibly present in other checks that compute phases.

commented

Thank you for raising this issue. I think it was raised due to the support query that I (or perhaps also others) raised, as it was pointed out by the Grafana Cloud support team. I'm seeing a negative duration of 292 years. I can understand asynchronous events being able to become a small amount negative, but if this is what I experienced then I would expect that there is an invalid value or conversion somewhere.

Sorry I think I understand what you are saying now. The start time is initialised to the current timestamp and transfer complete is initialised to zero, so the "duration" becomes the value of the start time timestamp.

Hopefully this can either be resolved, e.g. initialising transfer complete to start time or checking if transfer complete is 0 (although its probably not that simple), or worked around, e.g. reducing negative durations to -1 (to expose the problem while not breaking graphs) or 0 (to hide the problem). I've had this problem occur on Grafana Cloud a few times.