Autolink edge cases
digitalmoksha opened this issue · comments
Found a couple autolink edge cases:
-
See <<<http://example.com/>>>
comrak:
<p>See <<<<a href="http://example.com/%3E%3E%3E">http://example.com/>>></a></p>
cmark-gfm:<p>See <<<<a href="http://example.com/">http://example.com/</a>>>></p>
-
http://example.com/[abc]
comrak:
<p><a href="http://example.com/%5Babc">http://example.com/[abc</a>]</p>
cmark-gfm:<p><a href="http://example.com/%5Babc%5D">http://example.com/[abc]</a></p>
Re: the second item
Rinku actually does balancing, like both cmark and comrak do for parentheses.
Looking at the cmark code, they don't consider a bracket as an ending delimiter - comrak does.
And it looks like I probably broke this when I added the relaxed-autolinks
option - I added [
and ]
to LINK_END_ASSORTMENT
. https://github.com/kivikakk/comrak/pull/325/files
I can either
- change the code to make it behave as cmark does, and only if with the
relaxed-autolinks
option use the current behavior (or Rinku style) - leave as is
- make it Rinku style (supporting balanced brackets). So
http://example.com[abc]]
would give<a href=\"http://example.com[abc]\">http://example.com[abc]</a>]
re: the first item
It looks like by the time we start trying to detect the autolink, the data has already been unencoded, meaning it's <<<http://example.com/>>>
- they are no longer html entities. Not sure what, if anything, can be done about that.
My head officially hurts... 🤕
What lead me to this is that I'm trying to get rid of a custom auto_link filter that mimics what Rinku does. These are the two tests that are failing.
I may decide it's good enough to switch - I think these really are edge cases that I'm not sure how often we see in the wild.
Yes, indeed; Rinku is some preeeetty antique software by this stage (with no commit from the primary author since 2016, and none from the other maintainer (me!) since 2019), and I imagine the remaining users are pretty far and few between; certainly not at GitHub since the cmark-gfm
switch happened, as its own autolink was used from then, which is what Comrak aims to emulate.
Ideally we continue to match cmark-gfm in regular mode — I don't mind what the behaviour is once relaxed-autolinks is specified. Let me know if you want a hand with the former.
Rinku is some preeeetty antique software by this stage
oh yes, very much 😄
Ideally we continue to match cmark-gfm in regular mode
totally agree. Created #386 to address this.
Alright! So we have the second item addressed by #386 — thanks very much — which leaves us with this unpleasantness:
$ echo 'See <<<http://example.com/>>>' | comrak -e autolink
<p>See <<<<a href="http://example.com/%3E%3E%3E">http://example.com/>>></a></p>
$ echo 'See <<<http://example.com/>>>' | ~/g/archive/cmark-gfm/build/src/cmark-gfm -e autolink
<p>See <<<<a href="http://example.com/">http://example.com/</a>>>></p>
I might have a look into this in the next couple of days!
I might have a look into this in the next couple of days!
Turned into a couple of months, but I got there!