"context canceled" is Added as a Span Event on `cortex.ingester/QueryStream` Trace
kennytrytek-wf opened this issue · comments
Describe the bug
When a QueryStream operation ends due to the context being canceled, the error is added to the trace's span event.
To Reproduce
Steps to reproduce the behavior:
- Start Cortex with tracing enabled.
- Run a query that the distributor sends to more than one ingester.
Expected behavior
If an ingester context is canceled during the query (which I understand is normal operation of cortex?), then the operation results in an OK span status with no attached span event.
Environment:
- Infrastructure: Kubernetes
- Deployment tool: Helm
Additional Context
The span in question:
cortex/pkg/ingester/ingester.go
Lines 1744 to 1745 in ab3ca0a
Similar issue from the past:
#1279
How it was fixed in WeaveWorks:
https://github.com/weaveworks/common/pull/148/files
An example of a failing trace. In this example, there were five parallel query streams, and the one that was canceled was the slowest.
The span /cortex.Ingester/QueryStream
was instrumented automatically by gRPC tracing middleware I think. It is not
cortex/pkg/ingester/ingester.go
Lines 1744 to 1745 in ab3ca0a
We need to change gRPC middleware library behavior to ignore context canceled error. I am not sure if it is something we can do easily.
As a workaround, I added a transform processor to our Mimir OpenTelemetry collector that watches for this case and sets the span status to OK.
processors:
transform/cortexquerycontextcanceledspanevent:
error_mode: ignore
trace_statements:
- context: spanevent
statements:
- set(span.status.code, 1) where (span.name == "/cortex.Ingester/QueryStream" and attributes["message"] == "context canceled")