remarkjs / remark-gfm

remark plugin to support GFM (autolink literals, footnotes, strikethrough, tables, tasklists)

Home Page:https://remark.js.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tildes in link URLs being escaped

bertramakers opened this issue · comments

Initial checklist

Affected packages and versions

3.0.1 and earlier

Link to runnable example

https://codesandbox.io/s/late-architecture-btzgc4?file=/src/index.js

Steps to reproduce

Configure remark to use remark-gfm and process a Markdown file containing for example:

[link text](https://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm)
<https://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm>

It will be written as:

[link text](https://www.ics.uci.edu/\~fielding/pubs/dissertation/rest_arch_style.htm)
<https://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm>

Note that the tilde in the first link URL is escaped, which does not hapen without remark-gfm. In the autolink, the tilde is not being escaped.

This is an issue for us, because on https://stoplight.io/ we sometimes have to create links that look like:

[retrieving pass](/reference/uitpas.json/paths/~1passes~1{uitpasNumber}/get)

This is a link to a specific operation inside an OpenAPI file, which Stoplight replaces with a permalink that they generate for the endpoint before converting the Markdown to HTML. However this conversion to a permalink does not work correctly when the tildes are escaped, and the link won't work.

Expected behavior

Tildes in link URLs are never escaped.

Actual behavior

Tildes in resource links are being escaped.

Runtime

No response

Package manager

No response

OS

No response

Build and bundle tools

No response

commented

This is not an issue. Character escapes (and character references) work in destinations (the url part)

[a](b~c)
[a](b\~c)

Yields:

<p><a href="b~c">a</a>
<a href="b~c">a</a></p>

Rendered here:

a
a


the link won't work.

Where does it not work?

Hi @wooorm , thanks for the quick reply.

We write docs for our APIs in Markdown files, which are hosted on https://stoplight.io. We also have OpenAPI files that are also hosted on Stoplight. When we want to link from our Markdown files to specific API endpoints in the OpenAPI docs, we need to follow the steps documented here: https://meta.stoplight.io/docs/platform/hosted-docs/stoplight-urls#link-to-api-elements

From their example the link should look like:

[Display text](../reference/your-api-file.yaml/paths/~1verification-requests/post)

And Stoplight will automatically convert it to something like this:

[Display text](https://your-organization.stoplight.io/docs/your-project-name/a28a853fc0a4f-verification-request)

This works fine when we use remark with various plugins to lint the syntax and fix any issues in our Markdown automatically (like changing the marker of unordered lists from - to * for consistency). However we'd also like to add the remark-gfm plugin, because Stoplight supports the GFM spec and without remark-gfm, remark reports warnings on e.g. task lists like * [x] done.

However the problem we're having now with remark-gfm is that it escapes the tildes in those links I mentioned, like this:

[Display text](../reference/your-api-file.yaml/paths/\~1verification-requests/post)

While this may technically be correct Markdown and could convert to correct link URLs in HTML, the Stoplight platform does not seem to be able to handle it correctly because the link never works with that escaped tilde on their hosted docs. I cannot give you a real-world example of that because it would mean I'd need to break links in our public-facing API docs.

Note again that we do not convert the Markdown to HTML with remark, we only use remark to lint and fix our Markdown files before we push them to Stoplight.

I can understand that this is probably not "your problem" and Stoplight might be able to fix it on their end.

However I don't really understand why remark-gfm would do this tilde escaping as it does not seem related to GFM. And if there's a good reason to do it, why only do it for resource links (like [text](link)) and not autolinks (like <link>)?


Edit: I can give you an example of where use a link like this to illustrate how it works.

In this Markdown file on our repo with API documentation, we link to an API endpoint described in an OpenAPI file: https://github.com/cultuurnet/apidocs/blob/main/projects/uitpas/docs/creating-rewards.md?plain=1#L18

This specific Markdown file is published as HTML by Stoplight on https://docs.publiq.be/docs/uitpas/a28a853fc0a4f-creating-rewards. When you now click this "create rewards" link underneath the "Authentication" header on that page, it links to https://docs.publiq.be/docs/uitpas/db7606f7c881a-create-new-reward instead of the original URL with the tildes.

This is the same link as before, but converted by Stoplight to a permalink. It's links like these that don't work anymore when the tildes in the URL in Markdown are escaped, because Stoplight does not do the (expected) conversion to the permalinks then, probably because the escaped tilde causes issues in their conversion code.

commented

the Stoplight platform does not seem to be able to handle it correctly because the link never works with that escaped tilde on their hosted docs

Yep, this is a problem on their platform then

However I don't really understand why remark-gfm would do this tilde escaping as it does not seem related to GFM. And if there's a good reason to do it, why only do it for resource links (like [text](link)) and not autolinks (like <link>)?

We have an AST (abstract, so it only contains the meaningful info). Plugins inject arbitrary things into that AST, e.g., the string *ha* in a text node. (note: if they wanted emphasis, they should inject a node, not a string). Users could also have written such text with \*ha* (or even a character reference, &xyz;).

Now we have to serialize that AST as a string of markdown. We need to look for a very complex set of conditions of characters that can’t occur in things (reference: https://github.com/syntax-tree/mdast-util-to-markdown/blob/main/lib/unsafe.js). We need to escape those asterisks because otherwise it would create emphasis.

Tildes, in normal markdown, are fine (except when they could create fenced code, or inside the fenced code meta/info parts). But with GFM, they can also create strikethrough/delete (reference: https://github.com/syntax-tree/mdast-util-gfm-strikethrough/blob/24d6765e2321cae4f86b86f1416b6ac01ec0e30f/index.js#L21). Which is basically the same as emphasis, but with tildes, for <del>.


So, escaping is complex. And we’ve had this issue before with asterisks/underscores being encoded in a couple places. It wasn’t wrong, but they’re useless, because asterisks/underscores indeed are not needed. And the same is true for tildes.
Here’s the fix: syntax-tree/mdast-util-to-markdown@7b381da (L131 on the fixed side).
It could be applied here: https://github.com/syntax-tree/mdast-util-gfm-strikethrough/blob/24d6765e2321cae4f86b86f1416b6ac01ec0e30f/index.js#L21.

Interested in working on a PR?

commented

probably because the escaped tilde causes issues in their conversion code.

You should probably raise this with them too

Interested in working on a PR?

I can give it a try but I'm very new to remark and AST. Looking at the fix for asterisks and underscores, I think I only understand about half of it 😄 But I'll give it a go and open a draft PR in mdast-util-to-markdown mdast-util-gfm-strikethrough if I get stuck and have questions and I'll ping you there then.

Thanks for the very detailed information by the way!

commented

You only need to pass the array in (copy/paste), as notInConstruct!

Hi! This was closed. Team: If this was fixed, please add phase/solved. Otherwise, please add one of the no/* labels.

commented

Thanks Bert! :)

@wooorm is it possible this also requires a new release of remark-gfm to depend on the newer version of syntax-tree/mdast-util-gfm-strikethrough? (At least when using yarn like we are)

commented

No, that is not needed. Throw your locks and node_modules away, reinstall, done!

Ah yes my bad I was able to update it using yarn upgrade remark-gfm which also updated its dependencies :)