wooorm / refractor

Lightweight, robust, elegant virtual syntax highlighting using Prism

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Regex delimiters from `js` not included in `jsx`?

karlhorky opened this issue · comments

Hi @wooorm , sorry for all the issues lately!

Working on link and highlighting tooling a bit right now...

Here's something that I found interesting: the js language returns some regex delimiter tokens, but the jsx language does not... should it be like that? I thought those extra tokens came from js-extras, maybe this is not included with jsx or typescript or tsx?

Ran the following on Runkit:

var refractor = require("refractor")
console.log(JSON.stringify(refractor.highlight('/\bno\-highlight\b/', 'js'), null, 2))
console.log(JSON.stringify(refractor.highlight('/\bno\-highlight\b/', 'jsx'), null, 2))

...and received the following:

JS

[
  {
    "type": "element",
    "tagName": "span",
    "properties": {
      "className": [
        "token",
        "regex"
      ]
    },
    "children": [
      {
        "type": "element",
        "tagName": "span",
        "properties": {
          "className": [
            "token",
            "regex-delimiter"
          ]
        },
        "children": [
          {
            "type": "text",
            "value": "/"
          }
        ]
      },
      {
        "type": "element",
        "tagName": "span",
        "properties": {
          "className": [
            "token",
            "language-regex"
          ]
        },
        "children": [
          {
            "type": "text",
            "value": "\bno-highlight\b"
          }
        ]
      },
      {
        "type": "element",
        "tagName": "span",
        "properties": {
          "className": [
            "token",
            "regex-delimiter"
          ]
        },
        "children": [
          {
            "type": "text",
            "value": "/"
          }
        ]
      }
    ]
  }
]

JSX

[
  {
    "type": "element",
    "tagName": "span",
    "properties": {
      "className": [
        "token",
        "regex"
      ]
    },
    "children": [
      {
        "type": "text",
        "value": "/\bno-highlight\b/"
      }
    ]
  }
]
commented

Refractor is supposed to do what prism does (albeit virtual). Cab you confirm whether prism (does not) result in the same output?

On the test page and on a sandbox, the javascript language results in no extra tokens:

Test Page: https://prismjs.com/test.html#language=javascript
Demo: https://codesandbox.io/s/prismjs-forked-0ygs8?file=/src/index.js

So I suppose if there's a bug here, it's that js adds the extra tokens in refractor but not in Prism.js.

commented

I can confirm, javascript, jsx.

regex-delimiter is added in JavaScript, due to these Prism lines. They don’t include JSX.

The normal build of refractor includes everything, so regex runs, which modifies JavaScript. But that apparently doesn’t happen in these Prism cases? 🤔

commented

This seems to be due to the global state of Prism:

require("prismjs")
require("prismjs/components/prism-jsx")
require("prismjs/components/prism-regex")

console.log('js', Prism.highlight('/\bno\-highlight\b/', Prism.languages.javascript, 'javascript'))
console.log('jsx:', Prism.highlight('/\bno\-highlight\b/', Prism.languages.jsx, 'jsx'))

Yields:

js: <span class=\"token regex\"><span class=\"token regex-deli…pan><span class=\"token regex-delimiter\">/</span></span>
jsx: <span class=\"token regex\">/\bno-highlight\b/</span>

But moving prism-regex to before prism-jsx makes them the same.

The normal build of refractor includes everything

But moving prism-regex to before prism-jsx makes them the same

Ah, so the way to address this will be to enable all regex and other add-on features like js-extras for jsx too?

Wonder if this also affects typescript and tsx...

commented

Unfortunately, this is really hard to solve, because it’s a problem that comes from Prism. Prism works with one whole global object, and has intricate interdependencies. We’re already doing a lot to figure out which languages depend on each other. We used to do regexes, but since this version we’re using Prism-provided information to figure out what depends on what.

Sometimes there are direct dependencies. But in this case, it’s if X is defined, do something to it, otherwise ignore it.
If you load regex, it’ll mutate javascript if it exists (one of the few that always does). If you then load jsx, it uses the mutated JavaScript object. If you do it the other way around, jsx clones javascript before regex extends javascript. 🤔

I’m guessing it also affects ts(x) indeed. And I’m sure there are others.

since this version we’re using Prism-provided information to figure out what depends on what

Ah interesting... couldn't find the commit / PR for this.

Edit: Or do you mean this PR #32 ? With the commit c130059 (#32) ?

@RunDevelopment you wouldn't have any suggestions here would you?

If I'm understanding correctly, refractor wants to be able to load all Prism languages in the correct order for their dependencies, and is currently using some Prism-provided info for this.

Oops. Seems like I messed up with the dependencies here. I'll fix it.

commented

Ohh, that’s great to hear! <3

commented

Can confirm that this is now solved in the latest refractor.

Thanks @RunDevelopment!

Nice, thanks to both of you!

commented

@RunDevelopment Hmm, I’m running into this again. JS is included here by default. Regex isn‘t.

Because optional dependencies are passed as reference (example of regex in JS), the import order matters.

For Prism bundles, y’all are solving it very smartly (dependencies.js).
That makes it impossible (I think) to load a few languages first and more languages later (which is what I do here).

One solution I can think of is for the grammars to use:

'regex-source': {
  pattern: /^(\/)[\s\S]+(?=\/[a-z]*$)/,
  lookbehind: true,
  alias: 'language-regex',
  inside: 'regex'
},

...which is then resolved to Prism.languages.regex at runtime.
Would something like that be acceptable for Prism?

Alternatively, I could make more bundles for refractor: 0, common languages, and all languages. Then I can leave it up to users.

That makes it impossible (I think) to load a few languages first and more languages later (which is what I do here).

No, it's possible. The trick is that loading a language again will overwrite the old version. I.e. if your bundle includes JS but not Regex, you can just load Regex and then load JS again.

This isn't a particularly nice solution but it works. Our loadLanguages function uses exactly this strategy.


You did point out one of the painful consequences of Prism's current load-order-based dependency system. For Prism v2, we plan to adopt ESM into our codebase, at which point we will most likely have to phase out the current solution for optional dependencies and implement something (hopefully) nicer.

commented

Ah, so I could add:

regex.optionalDependants = ['javascript', 'vala']

And then when regex loads it’ll re-initialize javascript and vala when available? 🤔
A bit more complex would be to reload coffeescript and such, which depend on javascript.

Interesting, thanks, I’ll try something along those lines!

For Prism v2, we plan to adopt ESM into our codebase, at which point we will most likely have to phase out the current solution for optional dependencies and implement something (hopefully) nicer.

Nice! Is there an ETA on when that would be ready?

Ah, so I could add:

regex.optionalDependants = ['javascript', 'vala']

Sorry, but I don't know what that is about.

And then when regex loads it’ll re-initialize javascript and vala when available? 🤔
A bit more complex would be to reload coffeescript and such, which depend on javascript.

I suggest using our getLoader API. It will handle all of that for you.

getLoader(components, languagesToLoad, loadedLanguages)
    .load(id => { /* load language with the given id */ });

That's all you have to do. getLoader will ensure that everything works.

It's a little harder for asynchronous operations. Some languages can be loaded in parallel but others cannot. load will also take care of that for you, see the Promise example.

However, it might be a problem that getLoader needs to know all possible dependencies upfront.

Is there an ETA on when that would be ready?

Unfortunately, no.

commented

Thanks!

Sorry, but I don't know what that is about.

I generate custom bundles of each language, so that they can be registered on their own. For example, javascript.js.

However, I was hoping with my previous comment that it would be possible for this case to work:

  1. Assume all languages except for YAML are loaded
  2. Load YAML, which knows which other languages have be reloaded (such as markdown, and many others)

Unfortunately I found that a change in certain languages requires a lot of re-registrations, and resulted in refractor taking 1.3 seconds to load.
The code I used to find the things that would have to be reregistered was this pseudo code:

var list = getLoader(components, [theLanguage], allLanguages).getIds()
var reregister = list.slice(list.indexOf(theLanguage) + 1)