microsoft / monaco-editor

A browser based code editor

Home Page:https://microsoft.github.io/monaco-editor/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Revisit WebAssembly to support TextMate grammars

CitrusFruits opened this issue · comments

From the README:

We can revisit this once WebAssembly gets traction in the major browsers, but we will still need to consider the browser matrix we support, i.e. if we support IE11 and only Edge will add WebAssembly support, what will the experience be in IE11, etc.

From MDN:
Screen Shot 2020-04-10 at 10 13 10 AM

Also from the README:

The Monaco Editor no longer supports IE 11. The last version that was tested on IE 11 is 0.18.1.

IE 11 is no longer supported, and all major browsers support WebAssembly. It seems that now may be the time to revisit using WebAssembly to support TextMate grammars. Are there any other challenges/blockers from giving this a go?

Come to think of it: what is VS Code Online doing with respect to this problem right now? Is it falling back to Monarch grammars? (For example, are async and await failing to get highlighted as keywords in Python files in VS Code Online today?)

Or have some of these been changed to use the Semantic Tokens Provider API? https://code.visualstudio.com/api/references/vscode-api#DocumentSemanticTokensProvider

I can't comment on how it's being done, but I checked and it looks as though VS Code Online is using more advanced grammars than the ones packaged with Monarch.

Screen Shot 2020-04-23 at 8 39 12 AM

I tried with Python as well, but couldn't get syntax highlighting to work on VS Code Online, even after installing the Python extension. That may not be of much consequence though.

Screen Shot 2020-04-23 at 8 43 24 AM

The web version of vscode already supports Textmate grammars though https://github.com/NeekSandhu/onigasm (a oniguruma WebAssembly port) which has been adopted in https://github.com/microsoft/vscode-textmate a while ago. I think the performance of it has been proven to be quite good.

I guess the actual question behind this decision is: How to proceed with the monarch support? If it can be dropped entirely I think there is a good incentive for the vscode team to switch monaco over to TextMate grammars, but if the Monarch grammars would still have to be supported it becomes more difficult.

Textmate grammars are more widely used and more flexible, and are basically the standard. It doesn't make sense for Monarch to re-invent the wheel in this regard.

If I were to cast my vote, I'd say drop Monarch support, add support for Textmate grammars then provide a migration tool from Monarch language syntax definitions to Textmate grammars. Put it in a point release and write some helpful error handlers if someone tries to load in a Monarch definition.

OK, I did a bit of digging, which may hopefully save @alexdima some typing if he has time to chime in on this thread. I'm not sure if I have all this right because I haven't tried to write any code yet, but:

  • Microsoft/VS Code now does its own build of oniguruma for WASM: https://github.com/Microsoft/vscode-oniguruma. Note that the NPM module contains a pre-built version of of the WASM under release/onig.wasm for convenience. Further, note this package builds the WASM itself rather than depend on https://github.com/NeekSandhu/onigasm (which seems sensible from a security perspective). From the vscode-oniguruma README:

Oniguruma bindings for VS Code. This library is used in VS Code and is not intended to grow to have general Oniguruma WASM bindings.

It seems like vscode-oniguruma should be sufficient for Monaco's needs, but if you need to do other things with Oniguruma in WASM, you might still need https://github.com/NeekSandhu/onigasm.

/**
 * A grammar
 */
export interface IGrammar {
	/**
	 * Tokenize `lineText` using previous line state `prevState`.
	 */
	tokenizeLine(lineText: string, prevState: StackElement | null): ITokenizeLineResult;

	/**
	 * Tokenize `lineText` using previous line state `prevState`.
	 * The result contains the tokens in binary format, resolved with the following information:
	 *  - language
	 *  - token type (regex, string, comment, other)
	 *  - font style
	 *  - foreground color
	 *  - background color
	 * e.g. for getting the languageId: `(metadata & MetadataConsts.LANGUAGEID_MASK) >>> MetadataConsts.LANGUAGEID_OFFSET`
	 */
	tokenizeLine2(lineText: string, prevState: StackElement | null): ITokenizeLineResult2;
}
  • If you look over in the VS Code repo at abstractTextMateService.ts, you can find:
    • The TMTokenization class, which takes an IGrammar and invokes its tokenizeLine2() method from its own tokenize2() method.
    • The TMTokenizationSupport class, which takes an instance of TMTokenization. It implements ITokenizationSupport, though it supports only the tokenize2() method: it throws if you invoke tokenize().
    • The ITokenizationSupport interface is defined in vscode/vs/editor/common/modes.ts:
/**
 * @internal
 */
export interface ITokenizationSupport {

	getInitialState(): IState;

	// add offsetDelta to each of the returned indices
	tokenize(line: string, state: IState, offsetDelta: number): TokenizationResult;

	tokenize2(line: string, state: IState, offsetDelta: number): TokenizationResult2;
}

If you explore the signatures of the methods of ITokenizationSupport here, it appears to function as the union of TokensProvider and EncodedTokensProvider in monaco.languages. As such, it seems like it should be possible to take the code that I've referenced here and create an appropriate adapter such that you can make use of monaco.languages.setTokensProvider(EncodedTokensProvider) in standalone Monaco.

Thanks @bolinfest for reporting. Would be great if @alexdima can confirm this is a good way to go before we try it out.
But again @bolinfest thanks for sharing 👍

This is not working correctly yet, but I think it's close: https://github.com/bolinfest/monaco-tm.

OK, if you checkout bolinfest/monaco-tm@bcea24a and build and run the demo, you can see things working with Hack, which is a language for which no Monarch grammar exists.

Note that I am currently using these:

https://github.com/NeekSandhu/monaco-textmate
https://github.com/NeekSandhu/monaco-editor-textmate

I believe zikaari/monaco-editor-textmate#11 provides some insight as to why my initial approach is not working.

Of note from https://github.com/NeekSandhu/monaco-textmate#credits:

99% of the code in this repository is extracted straight from vscode-textmate, which is MIT licensed. Other external licenses used can be found in ThirdPartyNotices.txt

Ah, the critical step I was missing was setting theme data. Now I have everything working and I tried to remove all of my scratchwork from earlier iterations:

bolinfest/monaco-tm@ca4e82b

Would love to see this as well. I found this blog post from the creator of CodeSandbox where they got this working, and now support VSCode themes in Monaco: https://medium.com/@compuives/introducing-themes-e6818088bfc2.

I fixed a number of bugs and have cleaned up the code in my repo:

https://github.com/bolinfest/monaco-tm

I can run the Webpack config in dev mode just fine and see things working. Unfortunately, I get all sorts of garbage errors when I try to run Webpack in prod, which is unfortunate, as it would be nice to create a prod version so I can publish a demo to GitHub pages. I've already sunk more time in trying to debug Webpack than I care to, so if you're a Webpack expert and would like me to publish to GitHub pages, I would gladly take help in fixing up my Webpack config to make this possible.

Hi @alexdima
Thanks for the great work on 0.21.0 !

How should we interpret the documentation tag on this issue?
I haven't seen any reference of new TM grammer support in 0.21.0, so I'm assuming this is already possible and you will create a demo or write a post about it?

If so, is the solution similar to @bolinfest's attempt? Or can it also be achieved in the AMD version without Webpack?

I'm very eager, to say the least, to get this working in production :)

👍 Congrats @bolinfest for putting all the puzzle pieces together in the correct way :) !

There is not a lot more to show.

Both vscode-oniguruma and vscode-textmate ship in a UMD format and are therefore straight-forwarded to be loaded via AMD as well.

Variations could be made where monaco-editor-core is used directly, or a monaco-editor is created without monaco-languages (by forking this repo and dropping it from metadata.js), but the essence of the technique is captured in @bolinfest's example. I have therefore updated the README to point to it and it can serve as an example on how to do it.