siefkenj / unified-latex

Utilities for parsing and manipulating LaTeX ASTs with the Unified.js framework

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Space after argument-less macro should be absorbed

Evan-Zhao opened this issue · comments

Hi @siefkenj, thank you for the library! I'm working with the parser and noticed extraneous whitespaces being given in the following case:

{\em 5 stars}

which in Latex should render as 5 stars without any space before it.
I had to look it up; this SO question confirms that in such positions (macro without arguments and not guarded by a {} empty statement) the following space is absorbed and doesn't render.
Currently the unified-latex parser acts differently and produces that whitespace.

Is this intentional? How should I approach and fix it if I'd like to contribute?

Context: I'm using this Latex parser to "transpile" a small set of Latex into other typesetting languages such as Typst. It's very hard to have great Latex feature coverage, and I don't quite intend to do so. Still, it may prove useful for people to migrate their old latex projects and save some manual work :)

Hi @Evan-Zhao -- saw your Context, and responding to that! We are doing something similar over in https://mystmd.org, and it may be helpful to take a look at the packages we are working on there (tex-to-myst, and myst-to-typst - demos in the online docs) -- let me know if you are interested in hearing more on how we are working through that translations with unified-latex!

At the moment, this behavior is intentional, since unified-latex was first written to be a pretty printer (and so intended to preserve formatting of the code, not exact TeX behavior).

There are a few things you can do. If your macro takes an argument, give it a signature of "m", and the argument will be absorbed, correctly accounting for whitespace. In the case of \em it is a streaming command which doesn't take an argument. Things like \em 5 is the same as \em5, but \em A is not the same as \emA...

In any case, if you want to remove whitespace immediately following specific macros, you can use the replaceNode command and check for whitespace. Then you can see if the item in the containingArray immediately preceeding the whitespace is a macro. If so, return null from replaceNode.