citation-style-language / schema

Citation Style Language schema

Home Page:https://citationstyles.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

add ability to remove protocol from URL

eroux opened this issue · comments

Is your feature request related to a problem? Please describe.

MLA Handbook 8th edition says in 2.5.2 URLs and DOIs:

When giving a URL, copy it in full from your Web browser, but omit http:// or https://.

but modern-language-association.csl can only copy the URL that is given as argument.

see

Describe the solution you'd like

Add an npurl to the list of link renderings:

https://github.com/citation-style-language/documentation/blob/master/specification.rst#appendix-vi-links

that would remove the protocol from a url

For changes to the CSL RNC schemas, this should include links to citation style guides. It's better to include links to more than one style guide, and no more than five, for us to assess how widely needed this feature might be.

MLA Handbook 8th edition (published book, not openly available online)

Thanks for the suggestion!

This is a quirky rule that is unique to MLA as far as I am aware.

I personally don't really understand why MLA bothers with it and my standard preference is that more clarity rather than less is preferable. For example, many publishers are named "something.com" and including the protocol more clearly distinguishes what is an actual URL where the item is located.

Given the rarity of this rule, I'm not sure it's worth the additional complexity to ask citation processors to include an option to strip the protocol from a web link.

What do you think @denismaier @bdarcus @adam3smith ?

@eroux

Add an npurl to the list of link renderings:

So you are not proposing schema changes, but instead for this to be handled in the data?

If yes, that seems less-than-ideal, given that it's style dependent?

And if there were schema changes, I would think still open is whether it would be a global parameter/attribute (probably my preference if we do this, since it fits better with the direction we were going with url rendering in general), or lower-level.

What do you think @denismaier @bdarcus @adam3smith?

IDK. As you are aware, I'm not fond of many citation rules.

I'm also skeptical: I have searched around a bit and haven't found a single example of this in a citation style other than MLA

well, don't count on me to defend the idea of removing the protocol, I find it quite silly to be honest... but for better or worse, MLA is quite popular in the humanities so that would still be helpful to a number of users

BTW, they promote this on twitter; a recent tweet pointing to this:

https://style.mla.org/citing-twitter/

@andras-simonyi - since you've been dealing with URLs lately, any feedback?

And while I'm at it, any feedback/suggested changes on our related recommendations?

https://github.com/citation-style-language/documentation/blob/master/specification.rst#appendix-vi-links

Well, as for adding something like an attribute to remove the protocol from rendered URL's, I don't see big problems from the implementor's point of view (I wouldn't tinker with the data model). OTOH, I agree that hiding information (which is actually already in a condensed format) is not a good policy, and can even have security implications, e.g., it does matter whether the protocol is https or http.

Regarding the 1.0.2 appendix on links, I ran into an issue when implementing external linking according to the recommendations: what should be done when the style dictates rendering an URL prefix different from the one recommended for linking, e.g., for DOIs. All options can be problematic: if the recommendation is followed to the letter then an explicitly rendered URL will be linked to a different one; the alternative is to link to the rendered URL, which might even be inaccessible.

A more general thought on linking: wouldn't it be more elegant and extensible to handle external linking explicitly in the underlying/implied rich-text model (say, with a href attribute) and leave the whole business of specifying what to link and how to the individual styles? When I implemented the recommendations I had to extend the rich-text model with href anyway.

Well, as for adding something like an attribute to remove the protocol from rendered URL's, I don't see big problems from the implementor's point of view (I wouldn't tinker with the data model).

How would that look like? IIUC, we'd have to introduce a new sibling to cs:text, cs:url or so, because otherwise the new attribute would be usable in absurd situations. E.g. <text variable="url" form="no-protocol"/> or <text variable="url" url-form="no-protocol"/> makes sense, but what about <text variable="title" form="no-protocol"/> or <text variable="title" url-form="no-protocol"/>?

Or could we allow a new value on @form or a new attribute url-form only when the rendered variable is url?

This is a quirky rule that is unique to MLA as far as I am aware.

There surely are some institutional styles, even unrelated to the MLA, with this requirement (I know of one :)).

And, how could it not be, Austrian legal styles require funny things: either no protocol at all or no protocol unless it is https.

[Edit: Every time, I come up with such examples I fear you must be thinking, I’m making this all up :D]

@bdarcus And while I'm at it, any feedback/suggested changes on our related recommendations?

Yeah. I think the "if no url rendered in bib entry, add the anchor to the title` fallback thing goes too far. In a nutshell, I don't think anyone wants titles to be blue/underlined but link styling is fine for all the others, and that is quite tricky to resolve at the CSL processor level. More broadly, departing from a convention where every URL is visible in textual form is a bad idea. Think accidental inclusion of incorrect links that someone writing a paper might not even notice have made it in. Think sci-hub.

I can bang out a proper issue for that if you want and discuss there.

I can bang out a proper issue for that if you want and discuss there.

I think we should just remove that part. Maybe a PR?

How would that look like?

What I was asking about earlier was a likely a global boolean.

<style url-format="strip-protocol">

How would that look like?

What I was asking about earlier was a likely a global.boolean.

<style url-format="strip-protocol">

Ok, much better than an attribute on cs:text...

Okay, given that we have found at least a few examples, I think a style attribute makes sense. Could someone open a PR?

How should this be handled for rendering of DOI or other identifiers? I suppose that the style would just specify an appropriate prefix in that case.

With respect to @georgd's example about https://, I think in that case, the correct behavior should be to just set the style to not strip protocols—no site should be using bare http:// anymore, so that issue should go away by attrition.

[Edit: Every time, I come up with such examples I fear you must be thinking, I’m making this all up :D]

I've spent enough time working on German and UK university styles to know what you say is plausible haha!

How should this be handled for rendering of DOI or other identifiers? I suppose that the style would just specify an appropriate prefix in that case.

That's why I mentioned the spec recommendation section.

Was thinking we could rely on this simple attribute plus those recommendations first, and see if and how requirements evolve.