syndicated-media / sn-spec

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Analytics reporting

farski opened this issue · comments

Right now apps have better insight into listening behavior for a show than that show's own producers, except in cases where an app (Google Play, Spotify) provides a dashboard.

It would be great if a feed could define a way of an app reporting back to the feed's owner in a standard way.

This isn't a good idea, but gets the point across (one could imagine request being made to these URLs with GUIDs from the <item>, etc)

<channel>
    <endpoints>
       <event action="start" url="http://my.podcast.com/metrics/start">
       <event action="heartbeat" url="http://my.podcast.com/metrics/heartbeat">
       <event action="complete" url="http://my.podcast.com/metrics/complete">
    </endpoints>
</channel>

For most people, these would be handled at the distributor level (libsyn, NPR, soundcloud, etc); it's not like ever podcaster out there would have to spin up their own collection service.

Big +1 for this. Podcasters are hungry for more granular statistics. How frequent do you envision this heartbeat? I imagine the client stacking events timestamped (relative to the podcast duration) like 'start', 'stop', 'seek', and then sending them over in bulk in a periodic request or on complete.

Edit: I think we would also want to provide some sort of response that lets the client know that the stats were successfully recorded and that it can clear out the event stack and not resend a request until there are more events to log.

I wonder if it would be possible to completely replace download/server statistics with this, so also adding actions like download started, and download failed or completed. But doing so would require some sort of "state" so maybe a UUID? The client could then chose if that should be unique per episode, per feed or per person.

I'm visioning POST https://example.org/track?action=DOWNLOAD-START&uuid=b76901186d328371&time=1472376924&feedid=xxx&entryid=yyy with the appropriate http status code response.

So in the feed it would only have one end point (https://example.org/track). But I do like the actions being listed in the feed so the client wouldn't send unwanted actions, a bit of transparency.

@farski worth bringing a few publishers into the conversation here deliberately. I suspect they won't need this much, could be good enough to service the endpoint like the one that one show had us add to their app.

@farski @Cj-Malone @chrisrhoden @joshontheweb We just opened sourced our Open Podcast Analytics spec/protocol and just found you guys/gals while looking for people and organizations that could help provide feedback.

Backtracks is an audio analytics and podcast analytics startup. OPA works across apps/clients, players, and publishers (it works equally well across mobile and desktop.). We'd love to work with you guys and OPA works well with what you're doing on the RSS side of things.

A great deal of thought, time, and iteration was spent on OPA internally and we're opening it up and looking for feedback as it's really a spec/protocol that the podcast ecosystem could benefit from.

Think we could do some really great things together and now that we've found each other, that's possible!

👍

@jgill333 is their an explanation of networkState and readyState values? Also as their can be multiple events in a single submission, is it also valid to save all the events to the end of playback push a single submission?

How is the endpoint disclosed? An RSS/Atom extension?

And whats the use case of author, duration, explicit, publisher and title as part of an event object? Surly you already know these.

And I don't know a single podcast client that has a loop feature, but it's required wouldn't it be better for it to be optional presumed false?

@jgill333 Welcome! This looks like an interesting idea.

I think that for a mobile analytics protocol it's very verbose, with lots of data that, at least in a podcast setting, the client is not the source of truth for (@Cj-Malone pointed several of these out). I'd like to see every effort made (at the expense of human readability) to shrink the protocol as much as possible. Having "props": { ... } in every event adds 14-15 bytes to every event after base64 encoding. If we're thinking dozens of events will be triggered by each play, this adds up very quickly in a mobile context.

There's also required stuff that should have a sane default so as to save the bytes over the wire, and a few things that are stated as mirroring HTML5 properties. At least one property is mirroring APIs that were designed 15+ years ago, and may be worth reconsidering in a circumstance where we don't even have those APIs and would need to emulate them.

The current version of the spec doesn't seem to take into consideration the realities of dynamic ad serving, which is increasingly a concern for publishers. There are small tweaks one could make to get there, but something worth considering.

I guess my last question for now would be, how does the client get the media ids? You've outlined several ways for publishers to issue or be issued media ids, but not expressed how the client is supposed to associate those.

@Cj-Malone Thanks for the feedback.

  • Is their an explanation of networkState and readyState values?
  • Also as their can be multiple events in a single submission, is it also valid to save all the events to the end of playback push a single submission?
    • Yes, it is entirely possible to queue and submit later, send one event, and/or batch events periodically.
  • How is the endpoint disclosed? An RSS/Atom extension?
    • The endpoint for the analytics would be exposed in RSS/Atom extension. Looking back on this thread, you all have some good suggestions (it makes more sense to have one endpoint vs. defining one for every event to us though). Also, depending on the use case an app/client/player could also chose to implement sending events without this information/change.
  • And whats the use case of author, duration, explicit, publisher and title as part of an event object?
    • The reason would be that analytics services would not need to rely of reading RSS/Atom or other data and all the data they need to perform their part of the work in the ecosystem is sent in the events themselves. The structure of the spec/protocol allows for applications/clients, publishers, and analytics providers to build or use analytics solutions without those solutions needing to particularly know anything about RSS/Atom. So it enabled use cases where the world has not yet extended the RSS/Atom spec for changing/adding analytics URLs, but an app/player/etc. chose to record it's events analytics in OPA.
  • And I don't know a single podcast client that has a loop feature, but it's required wouldn't it be better for it to be optional presumed false?
    • Yes, you are correct. We'll likely add default values for omission of data.

@chrisrhoden Thanks for the suggestions. We'll look at how we can reduce the payload size even more including adding defaults.

In relation to ads:
We're hoping their is some consensus on how ad related events should be recorded and represented, but for now these type of events could be represented as custom events in OPA. If there is an emerging and/or known set of data that is needed for these events, that would be great to have some info on and we can all have a known set of events/types for that.

In relation to obtaining the media ids:
The source of that data is not defined as there is no uniform source of truth for that data and it differs slighty by use case. The media ids can be the GUID of a podcast episode, a hash of a URL, known publisher generated ids, commercial tracking ids like barcodes, etc. The optional (but useful) media_ids array is purposefully open to multiple sources of data and values. Based on your feedback, I think we should add some text on possible sources for this information as a guide for an application/client.

@chrisrhoden @jgill333 I would drop the requirement for POST to have the body base64ed, then you don't have worry about a few bytes here and their, HTTP compression is good enough. Sane defaults and minimal requirements is still a good thing. However if you're base64ing it into the URL, compression won't do anything so maybe you should explicitly say the POST is preferred.

As for dynamic ads I think, a hash got via media rss, or the client hashing the file, then becoming a media_ids with a type of hash_md5, hash_sha1 hash_256 etc. That way the server can adjust the time related properties correctly.

I think this spec is a bit vague, and that how you've designed/wanted it, but I don't think that's good in current podcast client environment. Most clients don't even support all of the RSS spec let alone namespaces, the client devs are mostly in it for designing UX. So you could probably improve by being more defined, more pre defined media id types and properties with documented purposes.

Thanks @Cj-Malone, will take this advice into account.