using linkify-it

Question

using linkify-it

SupertigerDev opened this issue 5 years ago · comments

I'm trying to use linkify-it with simple-markdown because the links don't fully work the way I want it to.
I tried doing this:

//import linkify-it
import linkify from 'linkify-it'
const linkifyInstance = linkify();

//... later in the code
match: function(source) {
   return linkifyInstance.match(source)
},

but i get this error:
Error in render: "Error: `match` must return a capture starting at index 0 (the current parse index). Did you forget a ^ at the start of the RegExp?"

This is what linkifyInstance.match(source) contains:

Aria Buckles · Answer 1 · Wed Oct 30 2019 06:46:46 GMT+0800 (China Standard Time)

So I think there are two things that will make this tricky, but I think both are solveable:

SimpleMarkdown parse rules need to:
- return a regular-expression .exec style result, i.e. an array that looks like: ["full capture string", "match1", "match2", /* ... */ ]
- return a match for the start of the input only, and null if the start of the input (at character 0 in the input string) does not match.
The text rule needs to not capture any input that might be part of another rule. By default, the text rule will match all of hi nertivia.tk, preventing a linkify rule from being able to match nertivia.tk

We can solve problem 1 by adapting the Match array output from linkify-it to simple-markdown's api needs:

    linkify: {
        order: order++,

        match: function(source) {
            let linkifyMatches = linkifyInstance.match(source);

            // if linkify found nothing, or if the first linkify match is at any index past 0, we have no match yet:
            if (
                linkifyMatches == null ||
                linkifyMatches.length === 0 ||
                linkifyMatches[0].index !== 0
            ) {
                return null;
            }

            // translate linkify-it's match to simple-markdown's match/capture format:
            let capture = [
                linkifyMatches[0].raw,  // the first element in the array must be the raw matched text
                // future elements in the result can be anything, so take whatever parts you need here!:
                linkifyMatches[0].text,
                linkifyMatches[0].url,
            ];
            return capture;
        },

        parse: function(capture) {              
            return {
                content: {
                    type: 'text',  // mark that our content is the raw text of the link
                    content: capture[1],
                },
                url: capture[2],
            };
        },

        html: function(node, output, state) {
            return '<a href="' +
                SimpleMarkdown.sanitizeText(SimpleMarkdown.sanitizeUrl(node.url)) +
                '">' +
                output(node.content, state) +
                "</a>";
        },
    },

We can solve problem 2 in one of two ways:

* Modify the default text rule to not break after spaces and punctuation, so that any links after a space or punctuation are properly captured
* Or have our `text` rule run `linkify-it` to make sure it doesn't capture anything that could be a link

The first option is faster, the second option is more thorough. Here's what they would look like:

    // Option 1: make the default text rule break after spaces and punctuation
    text: {
        order: order++,
        match: function(source) {
            // copied and modified from simple-markdown.js
            return /^[\s\S][0-9A-Za-z\u00c0-\uffff]*\s?/.exec(source);
        },
        parse: SimpleMarkdown.defaultRules.text.parse,
        html: SimpleMarkdown.defaultRules.text.html,
    },

    // Option 2:
    text: {
        order: order++,
        match: function(source) {
            // Run linkify and then only match text (using the default text rule) from the non-linkified bits.
            let linkifyMatches = linkifyInstance.match(source);

            // Figure out the index of the next linkify match, or use the end of the source if none were found:
            let indexOfNextLink = source.length;
            if (linkifyMatches && linkifyMatches.length > 0) {
                indexOfNextLink = linkifyMatches[0].index;
            }

            // Then we can re-run the default text match on the subset of the source before the linkify match:
            return SimpleMarkdown.defaultRules.text.match(source.slice(0, indexOfNextLink));
        },
        parse: SimpleMarkdown.defaultRules.text.parse,
        html: SimpleMarkdown.defaultRules.text.html,
    },

Aria Buckles · Answer 2 · Wed Oct 30 2019 06:57:50 GMT+0800 (China Standard Time)

(oops, there are a couple bugs in the above; i'm cleaning them up now)

Edit: I think I've fixed the basic bugs; see below for putting it all together :)

Aria Buckles · Answer 3 · Wed Oct 30 2019 07:06:14 GMT+0800 (China Standard Time)

Putting that together with what you asked in #77 , your custom rules would look something like:

const SimpleMarkdown = require('simple-markdown');
const linkify = require('linkify-it');
const linkifyInstance = linkify();

let order = 0;  // order the below rules as declared below rather than by the original defaultRules order:

let rules = {
    // OPTIONAL: copy the paragraph rule with a new `order`
    paragraph: Object.assign({}, SimpleMarkdown.defaultRules.paragraph, {
        order: order++,
    }),

    linkify: {
        order: order++,

        match: function(source) {
            let linkifyMatches = linkifyInstance.match(source);

            // if linkify found nothing, or if the first linkify match is at any index past 0, we have no match yet:
            if (
                linkifyMatches == null ||
                linkifyMatches.length === 0 ||
                linkifyMatches[0].index !== 0
            ) {
                return null;
            }

            // translate linkify-it's match to simple-markdown's match/capture format:
            let capture = [
                linkifyMatches[0].raw,  // the first element in the array must be the raw matched text
                // future elements in the result can be anything, so take whatever parts you need here!:
                linkifyMatches[0].text,
                linkifyMatches[0].url,
            ];
            return capture;
        },

        parse: function(capture) {
            return {
                content: {
                    type: 'text',
                    content: capture[1],
                },
                url: capture[2],
            };
        },

        html: function(node, output, state) {
            return '<a href="' +
                SimpleMarkdown.sanitizeText(SimpleMarkdown.sanitizeUrl(node.url)) +
                '">' +
                output(node.content, state) +
                "</a>";
        },
    },

    // copy the bold/strong rule with a new `order`
    strong: Object.assign({}, SimpleMarkdown.defaultRules.strong, {
        order: order++,
    }),


    text: {
        order: order++,
        match: function(source) {
            // modified from simple-markdown.js
            // match any character, followed by letter/unicode characters, followed by an optional space
            return /^[\s\S][0-9A-Za-z\u00c0-\uffff]*\s?/.exec(source);
        },
        parse: SimpleMarkdown.defaultRules.text.parse,
        html: SimpleMarkdown.defaultRules.text.html,
    },
};

let parse = SimpleMarkdown.parserFor(rules);
let output = SimpleMarkdown.outputFor(rules, 'html');
// alternatively, if using react:
// let output = SimpleMarkdown.outputFor(rules, 'react');

let markdownToHtml = function(source, state) {
    // if you don't have a paragraph rule, you probably want to default `state.inline` to true, to
    // indicate to the bold rule that it is parsing inline text:
    if (rules.paragraph == null) state.inline = true;

    let parsedContentTree = parse(source, state);
    return output(parsedContentTree, state);
};

module.exports = markdownToHtml;

Supertiger · Answer 4 · Thu Oct 31 2019 21:22:20 GMT+0800 (China Standard Time)

Ah, thanks a lot, really appreciate you taking time helping 👌

Aria Buckles · Answer 5 · Sat Nov 02 2019 07:23:52 GMT+0800 (China Standard Time)

Happy to! Glad that helped!

Supertiger · Answer 6 · Wed Nov 20 2019 19:55:35 GMT+0800 (China Standard Time)

Hey, so I recently noticed that with that code, it doesn't check properly :( for example:
This works fine:
google.com
but does doesn't:
hello, google.com is dope!

EDIT: never mind, I did not read what you wrote properly 🤦‍♂ Thanks a lot again :D

Supertiger · Answer 7 · Fri Mar 06 2020 04:07:42 GMT+0800 (China Standard Time)

Hey there! I don't know if a new update broke something or what but when typing:
google.com google.com google.com
it seems like only the even ones turn into links 🤔 Any help please?
Thanks