Missing measurements in telemetry events
danschultzer opened this issue · comments
Hey @mtrudel,
We're looking to update opentelemetry_bandit
to support for Bandit 1.4 breaking changes in open-telemetry/opentelemetry-erlang-contrib#311 and measurements are missing from some [:bandit, :request, :stop]
telemetry events.
This happened for 404s with a Phoenix app.
Looking up Bandit.Telemetry.stop_span
calls there are two places where measurements are not set:
Line 54 in 4f15f02
Line 74 in 4f15f02
In't this incorrect as Bandit.Telemetry
explicitly states that there will be the above measurements in [:bandit, :request: stop]
events?
bandit/lib/bandit/telemetry.ex
Lines 30 to 59 in 4f15f02
I haven't verified whether the 404 is getting into the latter exception handling, but I suspect it does.
Actually we don't see the exceptions happening in our use case. It's weird, it seems that the conn/conn.adapter doesn't persist with the measurement, even though send_resp
gets called so it should have been set. It happens for the 404, but haven't had the time to grok the whole code flow. Bottomline, when stop_span
gets called in the Bandit.Pipeline
the measurement is lost.
The error case1 telemetry events won't have many / most of the measurements, this is ~more or less by design since it simplifies the mainline code so much. That being said, there should be some there; this may well be a bug.
Footnotes
-
Error case meaning anything that Bandit returns internally (stream errors, semantic errors, etc). Any response that your plug sets on its own is considered 'normal' by Bandit, regardless of its response code. ↩
Yeah, it makes sense there is not really any measurements for errors since it means it halted somewhere halfway so what are those measurements even good for. But shouldn't that be a [:bandit, :request, :exception]
instead of [:bandit, :request, :stop]
event then?
In any case that wouldn't address the issue we see in the otel library though, that's a 404 that is handled by Phoenix (no exception raised to Bandit), but somehow the measurements get lost. I'll see if I can make a reproducable example for you when I have some free time.
I'd love a repro for this if possible @danschultzer !
Looking into the pipeline code in more detail now that the post-1.4 world has settled down a bit and I think the only open question may be whether protocol errors should be exposed as a stop event or an exception event here:
Line 209 in 661596f
I'm open to either option, and am happy to take guidance from the telemetry folks on this
Closing due to inactivity. As I mentioned above, I'm happy to hew telemetry for these cases based on whatever otel folks want. Once / if someone has a concrete proposal just open a ticket up and I'll do my best to make it happen