kivikakk / comrak

CommonMark + GFM compatible Markdown parser and renderer

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Backslash in a link

vpetrigo opened this issue · comments

Hello!

Thank you for the awesome formatter. We experienced the following issue:

  • backslash in a link is replaced after running the comrak tool
PS C:\Users\vladi> cat .\temp.md
[Link Text](#1\.-link)
PS C:\Users\vladi> comrak -t commonmark .\temp.md
[Link Text](#1.-link)

[Link Text](#1\.-link) is a valid link to a heading anchor within a wiki, but after formatting the navigation is broken.

Version:

comrak --version
comrak 0.18.0

This is per spec; see example.

Thank you for the example.
That did not explains why backslash is removed though if [Link Text](#1\.-link) and [Link Text](#1.-link) produces exact same Commonmark output.

Also the following backslash sequence in a link should produce two different links:

PS C:\Users\vladi> cat .\temp.md
[Link Text](#1\.-link)
[Link Text](#1\\\\.-link)
PS C:\Users\vladi> comrak -t commonmark .\temp.md
[Link Text](#1.-link)
[Link Text](#1.-link)

while comrak reformats them to be the same.

That did not explains why backslash is removed though if [Link Text](#1\.-link) and [Link Text](#1.-link) produces exact same Commonmark output.

I don't quite understand what you are getting at here; the backslash is "removed" in the lexical analysis stage, per § 2.4 Backslash escapes — see examples 22 and 23, where the backslash is not included in the output HTML. There are many such examples where different CommonMark inputs produce the same output/represent the same AST.

But this:

PS C:\Users\vladi> cat .\temp.md
[Link Text](#1\.-link)
[Link Text](#1\\\\.-link)
PS C:\Users\vladi> comrak -t commonmark .\temp.md
[Link Text](#1.-link)
[Link Text](#1.-link)

is indeed a bug.

Similarly, [Link Text](#1\\.-link) should roundtrip in CommonMark, and in HTML produce <a href="#1%5C.-link">Link Text</a>.

see examples 22 and 23, where the backslash is not included in the output HTML

Sorry, my bad. Quite unfamiliar with Commonmark specs yet. 😅

I'll check what can be done to produce proper output for links with backslashes.

@kivikakk Maybe you should add special comments where you can wrap markdown text that comrak shouldn't touch. As is done in clang-format (see Disabling Formatting on a Piece of Code) and other code formatting tools.

For example:

<!--- comrak off -->
[Link Text](#1\.-link)
[Link Text](#1\\\\.-link)
<!--- comrak on -->

Comrak will find comments like comrak off/on, remove them, and leave the text inside unformatted.

Because in this particular case, following the spec is a bad thing.

@Nulllix I have no idea where this suggestion comes from, but that doesn't even begin to work. If Comrak didn't touch the text within those delimiters, then no Markdown conversion would happen at all and you'd be left with plain text and no link at all.

Following the spec is a good thing for a tool whose entire point is to follow a spec.

then no Markdown conversion would happen at all and you'd be left with plain text and no link at all.

But that's exactly what we need. Just to let Сomrak know that some part of the document does not need to be touched 😌

Because the link is in the form of:
[Link Text](#1.-link)
Is not valid in the Azure Wiki. For it to work, you must escape the point. To do this, you need to add \ in front of the dot and we do that. But comrak removes \ and it is very unpleasant.

It's just a suggestion, though.

But that's exactly what we need.

It really isn't. The issue here isn't that we just don't want to touch the document; there was (a) a misunderstanding about the spec, which we've resolved, and (b) a separate but related bug.

If we didn't touch that part of the document, then those wouldn't be interpreted as links at all. That's not what is being asked for here, is useful here nor generally useful, and would break spec. For someone who wanted it, that functionality would best be implemented at a stage before handing input to Comrak.

But that's exactly what we need.

Oh... you responded so quickly. I updated my post above, adding the reason why we have to put \ before the dot. Could you please reread my message?

Thanks for the update. In this case, it's ADO's wiki that's not compliant with the Markdown standard Comrak implements — ADO doesn't actually implement a CommonMark-compliant base, so it makes sense that roundtripping your Markdown with Comrak could produce something that has a different interpretation by ADO's wiki.

I will not respond to any further requests regarding Comrak's interoperability with a specific implementation of some other Markdown-like syntax. Maybe you can find an alternative workaround using entities—I don't know. Or you can implement your suggested solution at the stage before handing input to Comrak. You might reconsider using Comrak for whatever it is you're actually using it for here, because it is not and will not be compatible with ADO's wiki unless they decide to make their implementation CommonMark compliant.

Comrak is a CommonMark implementation.

Thanks for your understanding.


This issue will remain open for #309 (comment):

But this:

PS C:\Users\vladi> cat .\temp.md
[Link Text](#1\.-link)
[Link Text](#1\\\\.-link)
PS C:\Users\vladi> comrak -t commonmark .\temp.md
[Link Text](#1.-link)
[Link Text](#1.-link)

is indeed a bug.

Similarly, [Link Text](#1\\.-link) should roundtrip in CommonMark, and in HTML produce <a href="#1%5C.-link">Link Text</a>.