Replace rel=self with rel=canonical

Question

Replace rel=self with rel=canonical

cweiske opened this issue 8 years ago · comments

WebSub requires each resource to be delivered with a rel=self link:

Link Headers [RFC5988]: the publisher SHOULD include at least one Link Header [RFC5988] with rel=hub (a hub link header) as well as exactly one Link Header [RFC5988] with rel=self (the self link header)

The web is already adding links to the resource itself in HTML pages, it's the rel=canonical link which is supported by major search engines since 2009.

I do not see a reason to add a second link that has the same meaning. Please drop rel=self and replace it with rel=canonical.

See https://chat.indieweb.org/2015-05-22#t1432330810734000 for a discussion in die #indieweb channel about this:

tantek: however, this "even if the same resource is served at http, https, /, and /index.html, only one of those URls actually works as the push topic" - is an already solved problem
search engines have the same problem
and previously solved it with rel=canonical
thus if that really is a problem for PuSH as well, the PuSH should build upon the pre-existing rel=canonical for the page, rather than require rel=self

Christian Weiske · Answer 1 · Wed Nov 23 2016 20:03:00 GMT+0800 (China Standard Time)

rel=self was defined by Atom RFC https://tools.ietf.org/html/rfc4287 in 2005. But rel=canonical is more widely used for HTML.

Matthias Pfefferle · Answer 2 · Wed Nov 23 2016 20:05:50 GMT+0800 (China Standard Time)

it is also mentioned in the Web Linking spec https://tools.ietf.org/html/rfc5988#section-6.2.2

Pelle Wessman · Answer 3 · Wed Nov 23 2016 20:09:15 GMT+0800 (China Standard Time)

Are the two really semantically identical?

self seems to be defined as:

Conveys an identifier for the link's context.

More elaborately from the Atom spec:

The value "self" signifies that the IRI in the value of the href attribute identifies a resource equivalent to the containing element.

While canonical seems to be defined as:

Designates the preferred version of a resource (the IRI and its contents).

The spec for canonical can be found here: https://tools.ietf.org/html/rfc6596

An alternative to dropping rel=self could be to also accept rel=canonical perhaps? Have rel=self for atom and rel=canonical for html?

Christian Weiske · Answer 4 · Thu Nov 24 2016 13:12:22 GMT+0800 (China Standard Time)

Yes, it makes no sense to use rel=canonical on Atom feeds.

So the rules could be:

In the link header, look for either rel=self or rel=canonical
In the atom content, look for rel=self
In HML content, look for rel=canonical or rel=self

Julien Genestoux · Answer 5 · Wed Nov 30 2016 02:16:56 GMT+0800 (China Standard Time)

Meh. Not sure this actually brings anything while it breaks a lot of existing implementations.

Christian Weiske · Answer 6 · Wed Nov 30 2016 05:42:54 GMT+0800 (China Standard Time)

I would not like to add a new tag to HTML if the page already has rel=canonical - many pages use that already. But I see we should not look for it in the http link headers.

New suggestion:

In the link header, look for rel=self
In the atom content, look for rel=self
In HML content, look for rel=canonical or rel=self

Julien Genestoux · Answer 7 · Wed Dec 07 2016 02:41:26 GMT+0800 (China Standard Time)

I think there is a confusion here that self does not mean canonical. This is indeed confusing but I'll add a note to the spec to clarify this.

Julien Genestoux · Answer 8 · Wed Dec 07 2016 02:43:00 GMT+0800 (China Standard Time)

not having a "self pointing" link exposes us to silent failure (the subscriber subscribes to a url that is never actually pinged to the hub...). This is frequent with "silent" query strings.

Julien Genestoux · Answer 9 · Wed Dec 07 2016 02:45:39 GMT+0800 (China Standard Time)

Another example of why this is important is for URLs with redirects such as the 'today' URL for the IRC log in this very group. (@aaronpk can say it better!)

Aaron Parecki · Answer 10 · Wed Dec 07 2016 02:51:44 GMT+0800 (China Standard Time)

An example of when rel=canonical wouldn't work is the IRC logs for #indieweb and #social. The URL we tell people to bookmark is https://chat.indieweb.org/today, however that URL will always redirect to the current day's permalink, such as https://chat.indieweb.org/2016-12-06. That day's page would have a rel=canonical of itself, https://chat.indieweb.org/2016-12-06, but a subscriber would need to use a topic URL of https://chat.indieweb.org/today in order to receive updates.

The rel=self provides a way to advertise the topic URL to use, which may be different from the canonical URL. It probably would have been better to call it rel=topic, but I believe the term came from Atom's use of rel=self.

Rob Sanderson · Answer 11 · Fri Jan 13 2017 02:10:06 GMT+0800 (China Standard Time)

👍 to rel=topic. If there are aggregate resources that share a hub, then the rel=self would not follow the semantics established. For example, imagine subscribing to wikipedia versus to each individual page in wikipedia, then rel=self from https://en.wikipedia.org/wiki/PubSubHubbub to https://en.wikipedia.org/websub/all would not be correct (I believe).

Aaron Parecki · Answer 12 · Fri Jan 13 2017 02:14:19 GMT+0800 (China Standard Time)

Just to be clear, I wasn't actually suggesting changing it to rel=topic. This is not a new spec and we would much rather not break every existing implementation just for aesthetic reasons.

The example you provided sounds completely fabricated to me. I don't think anyone on the https://en.wikipedia.org/wiki/PubSubHubbub page would expect to be able to subscribe to updates for all wikipedia articles by just clicking a button on that page. Instead, they would actually navigate to the home page (or some other feed page) and subscribe there, which would have the appropriate rel-self link.

Rob Sanderson · Answer 13 · Fri Jan 13 2017 02:32:03 GMT+0800 (China Standard Time)

That example is fabricated of course, but the situation is one that we're facing today in several environments. The usage is machine to machine, rather than a human clicking a button.

As an example, the Getty Museum has a collection of some 100k objects. Each description changes VERY rarely, but the changes are typically also VERY important to propagate as they reflect significant changes in state. It would be ridiculous to require systems to subscribe to each object individually. So from the description of the object, we would want to have systems subscribe to the general hub for all objects' changes. If the required pattern is to have an intermediary resource to which each description refers (the "navigate to the home page and subscribe" approach), then what is the link rel for that interaction so that machines can perform it?

The same occurs in the IIIF community. Changes to a particular image are also very rare, but there are millions of them at each organization. Or in scholarly communication -- aggregating preprint journal articles at a subject level.

Existing proposed uses in those two environments:

If you're saying that we shouldn't use websub for those use cases, that would indeed be good to know!

Aaron Parecki · Answer 14 · Fri Jan 13 2017 02:36:53 GMT+0800 (China Standard Time)

Thanks for clarifying, that use case makes sense now. It does seem to be something different from the scope of WebSub which is "subscribe to updates of this resource". It sounds kind of like the "RecentChanges" feed in MediaWiki, which is linked from every page. I think a standard way of finding that master feed would be a useful thing, and then WebSub would be used to subscribe to changes of that feed.

Erik Wilde · Answer 15 · Fri Jan 13 2017 10:09:04 GMT+0800 (China Standard Time)

On 2017-01-12 10:36, Aaron Parecki wrote: Thanks for clarifying, that use case makes sense now. It does seem to be something different from the scope of WebSub which is "subscribe to updates of this resource". It sounds kind of like the "RecentChanges" feed in MediaWiki, which is linked from every page. I think a standard way of finding that master feed would be a useful thing, and then WebSub would be used to subscribe to changes of that feed.

that sounds like a sequence of following a "collection" link (find the collection this resource belongs to) and then subscribing to that one might do the trick? http://webconcepts.info/concepts/link-relation/collection

Julien Genestoux · Answer 16 · Sat Feb 04 2017 11:35:46 GMT+0800 (China Standard Time)

I don't think we made any new progress on this issue so I suggest closing it.

Aaron Parecki · Answer 17 · Tue Feb 14 2017 05:27:46 GMT+0800 (China Standard Time)

I agree. I believe the example in use I mentioned in #68 (comment) illustrates the need for the separate rel value.