micromark / micromark

small, safe, and great commonmark (optionally gfm) compliant markdown parser

Home Page:https://unifiedjs.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Implementation of autolink and literalAutolink (micromark-extension-gfm-autolink-literal) are inconsistent when handling "@."

DavidAnson opened this issue · comments

Initial checklist

Affected packages and versions

micromark 4.0.0, micromark-extension-gfm-autolink-literal 2.0.0

Link to runnable example

No response

Steps to reproduce

user@.com

<user@.com>

<user@e.com>

Expected behavior

Consistent treatment of user@.com by autolink and literalAutolink.

Actual behavior

user@.com and <user@.com> are both emitted as literalAutolink. Expected behavior is observed for <user@e.com> which is emitted as autolink.

This is significant for a linter which can be confused by the current behavior into adding infinite <> wrappers attempting to turn user@.com from literalAutolink into autolink: DavidAnson/markdownlint#1140

I propose that <user@.com> should be treated as autolink, which is seemingly possible if emailAtSignOrDot behaved differently:

function emailAtSignOrDot(code) {
return asciiAlphanumeric(code) ? emailLabel(code) : nok(code)
}

The micromark tokens (when using micromark-extension-gfm-autolink-literal) for parsing the above Markdown are:

content user@.com
  paragraph user@.com
    literalAutolink user@.com
      literalAutolinkEmail user@.com
lineEnding \n
lineEndingBlank \n
content <user@.com>
  paragraph <user@.com>
    data <
    literalAutolink user@.com
      literalAutolinkEmail user@.com
    data >
lineEnding \n
lineEndingBlank \n
content <user@e.com>
  paragraph <user@e.com>
    autolink <user@e.com>
      autolinkMarker <
      autolinkEmail user@e.com
      autolinkMarker >
lineEnding \n

Runtime

Node v16

Package manager

npm v6

OS

macOS

Build and bundle tools

Webpack

Welcome @DavidAnson! 👋
The auto link implementation matches that of GitHub, which has the same behavior.
It is working as expected.

Example


user@.com

<user@.com>

user@e.com

Hi! This was closed. Team: If this was fixed, please add phase/solved. Otherwise, please add one of the no/* labels.

I would not say this behavior is "expected" (or even logical), but I see how it is bug-compatible and understand why that is important. :) Thanks for the quick response!

GitHub uses a different heuristic for urls than commonmark.
There are edge cases sure, and that is something you could raise with the GitHub flavored markdown team.

The goal of the micromark project(s) is the match the specification.


Related:

Titus has been good at improving the commonmark spec.
The process to contribute back to GFM is a bit more opaque, and involves opening a support ticket and hoping it makes its way to the platform team.

Rock and a hard place, I get it! I've worked around this inconsistency in markdownlint and will say there are remarkably few times I've had to subvert the micromark parser. So thank you!!

I would really recommend to stick with GH. Folks will want their markdown to work with many tools. GH is soo big, and us (and you?) following that exactly, I think improves the world a bit! :)

To be clear, I am not trying to define a new specification or tell people to do different things. The problem I have is that the current current behavior of micromark causes markdownlint to suggest adding angle brackets indefinitely to these invalid email addresses. That is obviously bad and I want to prevent it. The easy way would be for the two specifications to agree on email syntax, but I understand that horse left the barn. So I have tried to detect this specific scenario in code and stop recommending extra angle brackets. This creates a special case in my code which I would rather not have. I understand now why micromark behaves as it does and you do not need to defend it to me. :)

However, the comment above made me think I had not explained the situation well. Here is the issue for markdownlint which has a little more detail if you are curious: DavidAnson/markdownlint#1140

Missing context on what the underlying lint rule is trying to enforce: https://github.com/DavidAnson/markdownlint/blob/main/doc/md034.md

Right, makes sense!
Personally, I would probably recommend if possible <> autolinks and otherwise full []() links.
Then you can always recommend not using these “bare” GFM URLs (they are so frail, the rules are hard to understand, I think they’re fine for places like this where you type comments, but not for markdown docs that are maintained over time).
Whether <> works is not too difficult to check in CM.

In the formatter we use the check is like this:

https://github.com/syntax-tree/mdast-util-to-markdown/blob/fd6a508cc619b862f75b762dcf876c6b8315d330/lib/util/format-link-as-autolink.js#L16-L33

The code looks a bit wobbly but haven’t had issues about it!

You just described a newer rule, MD054/link-image-style: https://github.com/DavidAnson/markdownlint/blob/main/doc/md054.md

:)