remarkjs / remark-github

remark plugin to link references to commits, issues, pull-requests, and users, like on GitHub

Home Page:https://remark.js.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Change request: Make commit hash regex more strict

EvHaus opened this issue · comments

Initial checklist

Problem

When using remark-github, it will attempt to detect commit hashes and convert them to links as per this regex. The Regex basically states that any 7-40 character long hash-looking string on a new line is converted. This poses some problems for our users who paste lists of other numbers.

Consider the following markdown content:

Hey Steve. Can you please review these customer invoices?

1600100999
1600100960

At the moment, remark-github will truncate and convert those numbers to hash links like this:
Screen Shot 2022-02-01 at 2 21 45 PM

This is not desired behaviour as the numbers aren't actually commit hash links in GitHub in this context.

Solution

My idea was to make a small tweak to the Regex to require at least 1 alpha character to be in the string before considering it a hash (add [a-z]+ to the rule). This does mean that the library will sometimes have a false negative for cases where a hash actually only has numbers but I think it's much more likely that a typical Git hash will have at least 1 letter in it.

I recognize this is a messy/unsound solution, but I think the tradeoff between UX and soundness might be worth it here.

What do you think? Would you consider such a change?

Alternatives

The proper solution here would be to not rely on Regex alone, and instead do some kind of actual query to your GitHub repo to check the validity of hash strings before converting them to links, but that's well outside the scope of this library.

commented
  • You should be able to solve this by passing a buildUrl that returns false?
  • This problem doesn’t align with the goal of this project though: try and match GitHub.

You should be able to solve this by passing a buildUrl that returns false?

That would turn off the feature entirely. I think it would be nice to keep it, but make it a bit more tolerant of values which are not likely to be commit hashes.

This problem doesn’t align with the goal of this project though: try and match GitHub.

Yes, I worry about this too. However, the current Regex-only solution doesn't align with GitHub already. For example, the following lines won't become links in GitHub:

1600100999
1600100960

But they will in remark-github. I was trying to find some solution that would bring the library closer to parity in GitHub for most common scenarios.

commented

buildUrl is a function. That gets different input. And can return different things for different input? 🤔


Indeed, GitHub checks whether issues/PRs/commits/users/etc exist, and this plugin doesn’t access the network.
That’s the one difference between this plugin and them.
I don’t think another axis of difference (disallowing or not hex alphas in commit shas) is much of an improvement

buildUrl is a function. That gets different input. And can return different things for different input? 🤔

Oh, I see what you're saying. We could just return null from there and prevent any link from being created for custom hash values, eg.

const buildUrl = (values, defaultBuildUrl) => {
    if (values.type === 'commit' && !customHashChecker(values.hash)) return null;
    return defaultBuildUrl(values);
}

Ok. That's good enough for me. Thanks for the sanity check.

Hi! This was closed. Team: If this was fixed, please add phase/solved. Otherwise, please add one of the no/* labels.

Hi team! Could you describe why this has been marked as wontfix?

Thanks,
— bb