Pre-render using `math-node-page` to avoid text reflows

Question

Pre-render using `math-node-page` to avoid text reflows

danmackinlay opened this issue 5 years ago · comments

I'm floating this idea to see if it is of interest to anyone...

I'm a big fan of @goessner's decision to use KaTeX at the math renderer for mdmath, since MathJaX is unbearably slow.

However even KaTeX scales badly and becomes slow in long enough documents. (I'm previewing dissertation chapters 😰)

I wonder if it is worth trying a pre-render approach? There are a number of packages outside of the VS code world now which use 'headless' methods of processing the HTML render the mathematics.

Specifically, you can render Mathjax to simgle-page HTML without a browser using node.js, which might reduce the intensive text reflows that choke up the markdown previews.

The tool du jour for this is mathjax-node-page —it even claims to output math as SVG, through some dark magic. See also mathmd, which leverages mdmath and ReLaXed which produces PDFs by this approach.

I don't know if this would be faster overall, or an appropriate development to explore for mdmath in particular (surely something would break if we put a major change like this in) but I'm definitely curious to find out if this would reduce the proportion of my life spend waiting for mathematics to render. 😉

Peter Wone · Answer 1 · Tue Apr 09 2019 07:15:22 GMT+0800 (China Standard Time)

What you are asking for is essentially an incremental compiler instead of an interpreter.

This won't make the first render any faster but it will make things a lot more responsive if you edit side-by-side with a preview as most people do.

The trick is to be able to associate each block with its cache entry. You can't just hash the block and use that as a lookup key because the block gets edited. You also can't embed metadata, and the only thing left is ordinal position.

That means when the block is loaded you find all the math blocks and their positions in the file, and as editing occurs you maintain a DOM with two kinds of node, math and not-math. Once you have that, you can tell when a math block is edited and you can set a dirty flag for the block.

The markdown rendering pipeline code that currently passes a math block for rendering should check a dirty flag and if necessary call render before returning the cache value for that block. Rendering resets the dirty flag. In JS the dirty flag can be implicit like this

return block.html || block.html = block.render();

Setting the implicit dirty flag in this approach just means setting block.html to null or undefined or empty string (anything falsy).

You could avoid heavy parsing on load if you persist the DOM in a temp file and start by loading it. Some people don't like metadata files proliferating but I guess this could be a setting. If you did this then the DOM would build up as you create your document and you'd never experience a slow start.

I don't have time for this right now but if you're lucky I'll come back to it. If you want it now then, well, open source right? :)

dan mackinlay · Answer 2 · Tue Apr 09 2019 07:39:31 GMT+0800 (China Standard Time)

Interesting points @PeterWone .

If I understand correctly you are talking specifically about how mdmath would do something similar to mathjax-node-page but adapted as a markdown extension to architecture of the markdown-it markdown renderer and, further, extended to include caching?

My original suggestion, mathjax-node-page, applies AFAICS to the HTML output of the markdown processor, so would be outside of the markdown pipeline. That might also benefit from caching of course, if rendering were not fast enough. It might also be easier to cache; if I understand correctly you are discussing the problem with content-hashing markdown that might be further processed in the markdown pipeline; I don't really know how that works but I will take your word on it. 😄 But AFAICT the HTML should avoid such problems. OTOH mathjax-node-page doesn't seem to actually cache, so that would still need to be implemented. Or perhaps not? Perhaps it is already fast enough not to need caching and we are getting ahead of ourselves? 😉

However, I'm just spitballing here - it sounds like neither of us have the time in the immediate future for such things...

Peter Wone · Answer 3 · Tue Apr 09 2019 12:00:57 GMT+0800 (China Standard Time)

I mentioned content hashing to dismiss it because the use case I had in mind was rendering in a preview side by side with markdown being actively edited. I thought this was what you were talking about because it's the case most sensitive to rendering delays

I don't know whether you use Visual Studio Code but if you do there's an extension for it that supports KaTeX in markdown, and the editor knows how to do a vertical split with editable source on one side and the rendered markdown on the other.

dan mackinlay · Answer 4 · Tue Apr 09 2019 16:42:10 GMT+0800 (China Standard Time)

Yes, I use said VS code extension heavily. Unless I'm mistaken it is @goessner 's mdmath extension, which is precisely the repo upon which we are now commenting, right? 😉

Although KaTeX is much more speedy than Mathjax, it still does not scale well enough in VS Code - I'm editing a document at the moment which is short, as far as mathematical documents go - as a PDF it is 9 A4 pages with about 200 mathematical expressions, but the redraws are painfully slow.

Peter Wone · Answer 5 · Tue Apr 09 2019 17:16:02 GMT+0800 (China Standard Time)

I must have hit back once more than I realised -- I was just looking at another related work and didn't realise this was the extension repo (or I wouldn't have made an ass of myself).

Well then I suspect my proposed strategy is very likely to help.

Do you have any material I can use as a benchmark? None of my stuff is complex enough to make this problem evident.

Stefan Goessner · Answer 6 · Wed Oct 09 2019 00:12:46 GMT+0800 (China Standard Time)

Three points here:

Every keystroke in vscode does a complete rendering of the preview html. So your idea of prerendering the markdown in parallel strikes somehow.
markdown-it-texmath is able to precompile the markdown containing math on node.js. But I know of no reliable way to inject the precompiled HTML into vscode preview window.
If I have some time again in future, I would like to have a closer look into the performance of the regular expressions used with markdown-it-texmath. Maybe there is some optimization possible. This is all I can offer.

thanks