open-telemetry / opentelemetry-android

OpenTelemetry Tooling for Android

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature: Link traces between them

GerardPaligot opened this issue · comments

Today, if I activate opentelemetry on my Android project to observe http requests and android lifecycle components (for example), I can't link my traces generated by the OkHttp interceptor with my trace generated by the android lifecycle observer.

It could be great to have this kind of feature to determine on which technical screen an http request has been executed or with other features available in this project like ANR, crashs, etc.

Hey @GerardPaligot thanks for the issue. Can you clarify the following:

  • are you using this android library or just the vanilla upstream otel sdk?
  • which okhttp otel instrumentation are you using?

The current expectation is that the SessionIdSpanAppender will put the session id into an attribute on all spans that are generated. The session is then what allows those okhttp spans to be associated. Furthermore, there is Activity and Fragment instrumentation that can be used to keep track of which "screen" the user is on at a given time. Tracking that allows you to answer the question of what screen initiated which http request.

It may not be trivial to get it all set up yet, so it would be great to hear more details about how it's going.

are you using this android library or just the vanilla upstream otel sdk?

For now, I'm using opentelemetry-android 0.1.0.

which okhttp otel instrumentation are you using?

Deprecated one:

  @Deprecated
  public Interceptor newInterceptor() {
    return new TracingInterceptor(instrumenter, propagators);
  }

And yes, the session id injected in all spans is a good start to know what the user executes during his session but this filter is too large. Where the session can tell me what the user interacted during his session with all android components and http requests and because trace ids are differents between android components and http requests executed at the same time, I can't know which http requests has been executed during an activity and/or fragment and get this kind of view in my observability backend:

observability (1)

(Of course, it is a simplified version)

I hope I'm more clear now.

If I understood correctly, it seems like you would like to group signals within a context that's narrower than a session. The first thing that comes to my mind would be to create long-running spans that start when an Activity/Fragment starts and end when it gets destroyed, which would cause every signal created in the main thread in between to get that "screen"'s trace-id which is what seems to be described in the diagram you shared, however, that approach raises a couple of concerns which I'd like to touch on to make sure we come up with a robust solution:

  • There's no guarantee that Activity.onDestroy will always be called, and even if it is, it might not be able to fully complete in case there are too many operations done in there, which could cause some spans to never end.
  • The Activity/Fragment span context will only be available in the main thread so, unless there's some proper context propagation work done, there might be instances in which some signals created in secondary threads will not get the Activity/Fragment trace-id.
  • It's a bit tricky to define a "Screen" in Android in a way that will work for every app as some might use an Activity per screen, others might use Fragments instead, we also have Composables, etc. So I think it would be too opinionated of this lib to automatically define when a screen starts and ends.

Based on that, I'm thinking that an alternative approach may be to create some global attribute, similar to the session.id, but instead this other attr would have to be set manually by the user of this lib whenever they believe that a new screen has started or ended within their app, so for example, let's say the attr is called screen.name and as a user of this lib I decide to set it to some activity name when that activity is created and set it to another activity name when another one is created, or set it to null if I don't want to send that attr anymore. And when set, that attr will be sent in every span and log, just like session.id.

This has been automatically marked as stale because it has been marked as needing author feedback and has not had any activity for 21 days. It will be closed automatically if there is no response from the author within 14 additional days from this comment.

I don't think this can be done automatically without a lot of things that are grouped and should not and vice versa.
The only solution here is a manual API, I agree, and I'd say this is already possible using the current manual APIs, maybe the improvement here is docs, and examples of how to properly trace known components such as Activities, fragments, etc.

For example, creating a trace during activity/fragment creation, also when to finish this trace.
All the HTTP requests under the current trace (the one created above) will be automatically linked, etc.

Side effect: if you have background jobs running and they also create spans, they will also be linked to the trace but maybe they should not, etc.

This has been automatically marked as stale because it has been marked as needing author feedback and has not had any activity for 21 days. It will be closed automatically if there is no response from the author within 14 additional days from this comment.

This has been automatically marked as stale because it has been marked as needing author feedback and has not had any activity for 21 days. It will be closed automatically if there is no response from the author within 14 additional days from this comment.