w3c / websub

WebSub Spec in Social Web Working Group

Home Page:https://w3c.github.io/websub/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

No connection between subscription request and verification from hub

aaronpk opened this issue · comments

If the subscriber is subscribing the same topic at multiple hubs, there is no way to tell which hub is making the verification request. Section 4.3.1 says that the subscriber MUST confirm that the topic corresponds to a pending subscription. This means if there are two pending subscriptions for the topic (at different hubs), it's ambiguous as to which the verification request corresponds to.

Potential solutions:

  • Add a parameter to the verification request where the hub sends its own URL in the request so that the subscriber can match the verification request to its subscription request.

It might be nice to have a link header with rel=hub, just like for content distribution; however, where content distribution has link rel=self to identify the topic (and it is not clear which URL should be supplied there; see #38), here we have a hub.topic parameter (with a clear meaning: the same URL supplied in the subscription request). So perhaps a parameter, um, hub.hub, yuck, might be less surprising.

I just realized that by using a unique URL per subscription, I can establish that link myself. I am currently attempting this with a query parameter in the callback URL.

That worked well, and I tested both Switchboard and Superfeedr with this method, both worked fine. In my implementation, I used callback URLs such as https://example.com/callback?token=xxxx, in order to both identify and authenticate each request to the callback endpoint.

I think the text should be updated to more strongly recommend using unique callback URLs per subscription. Currently it says:

It is considered good practice to use a unique callback URL for each subscription.

Perhaps that could be upgraded to a SHOULD? Or at the very least, we could add some text to the Security Considerations section that points out the benefits of using unique callback URLs.

The content-distribution stage includes enough info to determine the (hub, topic) pair; shouldn't the two be consistent? If we can do without the info here, we can probably also do without it at the content-distribution stage, no?

Some things I come to think of:

First: This from section 4.1:

Hubs MUST allow subscribers to re-request subscriptions that are already activated. Each subsequent request to a hub to subscribe or unsubscribe MUST override the previous subscription state for a specific topic URL and callback URL combination once the action is verified. Any failures to confirm the subscription action MUST leave the subscription state unchanged. This is required so subscribers can renew their subscriptions before the lease seconds period is over without any interruption.

So one needs to ensure that the callback URL to a hub is always the same for every topic URL or risk subscribing multiple times to the same topic URL at same hub.

Secondly: Security consideration wise the most important thing is probably to not construct the callback in such a way that it can be probed to check which feeds a subscriber has subscribed to. Having a hub-unique secret callback URL:s would stop this.

All in all: I would lean towards a hub-unique callback URL rather than a subscription unique callback URL. This makes it possible for the hub to know that all the subscriptions are from the same source and makes it harder to accidentally subscribe twice to the same URL while still achieving the same purpose.

@tonyg: The two can't be combined as the rel=hub and rel=self links describe a relation between the distributed content and its feed and hub while the subscription verification step is no content distribution but rather a verification of the subscription request itself. The rel=hub and rel=self in a content distribution can also change over time to allow hubs and feeds to change their URL:s while the verification of a subscription intent will always happen with the exact topic and hub combination that was used in the subscription (when the hub is derived from a hub unique callback URL). See eg: https://github.com/pubsubhubbub/PubSubHubbub/wiki/Moving-Feeds-or-changing-Hubs

I don't think hub-specific URLs are a good recommendation, as that wouldn't solve the original problem I had.

What I ended up with was creating a unique token for each topic + hub combination. Any time my subscriber wants to activate that subscription at that hub, it uses the same callback URL. This solves the accidental multiple subscription problem, and also lets me uniquely identify the particular subscription when I get the notificatinos.

@aaronpk Why isn't the hub.topic parameter together with a hub-specific URL enough to identify the subscription? You would have the same data at hands with that approach as with your current approach and can check whether you have a pending subscription for that hub + topic combo?

I suppose you're right. I guess I don't see a particular advantage to doing that over what I'm doing with unique hub+topic URLs.

@voxpelli: Re the links in content distribution headers: So these may change over time, as the upstream topic alters its metadata? It becomes very important then to say clearly in the spec that subscribers MUST NOT rely on the link headers to determine which subscription's content is being delivered to them! Instead, they must arrange some other means (such as a unique URL) to identify the context of the delivery. If I understand you correctly, the link headers related to the content, and not at all to the subscription.

@tonyg: The data is related to the content and the content is what ones subscribes to, but yes – one needs to be able to recognize changes in topic and/or hub to update ones subscription records and possibly resubscribe to the feed. A unique hub+topic like @aaronpk's would be able to do so and a hub unique URL would be able to detect at least a hub-move.

The scenarios are described in https://github.com/pubsubhubbub/PubSubHubbub/wiki/Moving-Feeds-or-changing-Hubs and are only relevant if one wants to do a real time redirect.

If one doesn't need a real time redirect then one can just ignore any pings that one doesn't recognize the hub and/or topic url:s of and rely on the refetching of the topic itself every now and then to detect whether it has moved or changed hub. One would then lose the real time updates inbetween the move and ones recognition of it, but apart from that it would all work fine.

Thanks @voxpelli! I'll update #38.

(A hub-unique URL wouldn't necessarily be able to detect a hub move, since there could be multiple chained hubs before reaching the topic source itself.)

I agree with @voxpelli's approach in which the subscriber is in charge of building its own callback URLs and should differentiate callback URLs based on the hub they're subscribing to if the resource points to multiple hubs.