kivikakk / comrak

CommonMark + GFM compatible Markdown parser and renderer

Home Page:https://hrzn.ee/kivikakk/comrak

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Autolink edge cases

digitalmoksha opened this issue · comments

Found a couple autolink edge cases:

  • See <<<http://example.com/>>>

    comrak: <p>See &lt;&lt;&lt;<a href="http://example.com/%3E%3E%3E">http://example.com/&gt;&gt;&gt;</a></p>
    cmark-gfm: <p>See &lt;&lt;&lt;<a href="http://example.com/">http://example.com/</a>&gt;&gt;&gt;</p>

  • http://example.com/[abc]

    comrak: <p><a href="http://example.com/%5Babc">http://example.com/[abc</a>]</p>
    cmark-gfm: <p><a href="http://example.com/%5Babc%5D">http://example.com/[abc]</a></p>

Re: the second item

Rinku actually does balancing, like both cmark and comrak do for parentheses.

Looking at the cmark code, they don't consider a bracket as an ending delimiter - comrak does.

And it looks like I probably broke this when I added the relaxed-autolinks option - I added [ and ] to LINK_END_ASSORTMENT. https://github.com/kivikakk/comrak/pull/325/files

I can either

  • change the code to make it behave as cmark does, and only if with the relaxed-autolinks option use the current behavior (or Rinku style)
  • leave as is
  • make it Rinku style (supporting balanced brackets). So http://example.com[abc]] would give <a href=\"http://example.com[abc]\">http://example.com[abc]</a>]

re: the first item

It looks like by the time we start trying to detect the autolink, the data has already been unencoded, meaning it's <<<http://example.com/>>> - they are no longer html entities. Not sure what, if anything, can be done about that.

My head officially hurts... 🤕

What lead me to this is that I'm trying to get rid of a custom auto_link filter that mimics what Rinku does. These are the two tests that are failing.

I may decide it's good enough to switch - I think these really are edge cases that I'm not sure how often we see in the wild.

Yes, indeed; Rinku is some preeeetty antique software by this stage (with no commit from the primary author since 2016, and none from the other maintainer (me!) since 2019), and I imagine the remaining users are pretty far and few between; certainly not at GitHub since the cmark-gfm switch happened, as its own autolink was used from then, which is what Comrak aims to emulate.

Ideally we continue to match cmark-gfm in regular mode — I don't mind what the behaviour is once relaxed-autolinks is specified. Let me know if you want a hand with the former.

Rinku is some preeeetty antique software by this stage

oh yes, very much 😄

Ideally we continue to match cmark-gfm in regular mode

totally agree. Created #386 to address this.

Alright! So we have the second item addressed by #386 — thanks very much — which leaves us with this unpleasantness:

$ echo 'See &lt;&lt;&lt;http://example.com/&gt;&gt;&gt;' | comrak -e autolink
<p>See &lt;&lt;&lt;<a href="http://example.com/%3E%3E%3E">http://example.com/&gt;&gt;&gt;</a></p>
$ echo 'See &lt;&lt;&lt;http://example.com/&gt;&gt;&gt;' | ~/g/archive/cmark-gfm/build/src/cmark-gfm -e autolink
<p>See &lt;&lt;&lt;<a href="http://example.com/">http://example.com/</a>&gt;&gt;&gt;</p>

I might have a look into this in the next couple of days!

I might have a look into this in the next couple of days!

Turned into a couple of months, but I got there!