siefkenj / unified-latex

Utilities for parsing and manipulating LaTeX ASTs with the Unified.js framework

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

\\ macro should be parsed differently in some math environments

theseanl opened this issue · comments

Steps to reproduce

  1. Compile below with latex, then inspect output
\documentclass{article}
\usepackage{amsmath}

\begin{document}

1 \\ [10pt] 2

\begin{gather*}
1 \\ [10pt] 2
\end{gather*}

\end{document}
  1. Parse it with unified-latex (e.g. by pasting it to https://siefkenj.github.io/latex-parser-playground/)

Expected behavior

In the first 1 \\ [10pt] 2 line, [10pt] is treated as an argument for the \\ macro.
However, in the second occurrence, [10pt] is rendered directly to the output. Only after removing a whitespace between \\ and [10pt] it is treated as an argument for \\.

Thus, the parsed AST should treat two cases differently.

Actual behavior

unified-latex currently parses [10pt] as an optional argument for \\ in both cases.

It seems that this behavior is specific to gather environment, because for eqnarray*, [10pt] is always treated as an argument for \\,

\begin{eqnarray*}
1 \\ [10pt] 2
\end{eqnarray*}
% Above and below produce the same output
\begin{eqnarray*}
1 \\[10pt] 2
\end{eqnarray*}

which seems to be a sensible behavior. I hit this with tex codes having Lie brackets right after a line break.
I'm wondering if there's a central source of truth for such a subtle parsing behavior - where can I find how exactly gather* environment modifies it?

Hmmm, I don't completely understand your comment, but I thought I had made it so that whitespace prevented \\ from consuming an optional argument...
It's definition is here:

It seems this behavior is tested here:

it("gobbleSingleArgument won't gobble if whitespace is not permitted", () => {

So I am not sure where it's going wrong...

I found the following relevant excerpt in xparse documentation page 4:


There is one subtlety here due to the difference in handling by TEX of “control
symbols”, where the command name is made up of a single character, such as “\”. Spaces are not ignored by TEX here, and thus it is possible to require an optional argument directly follow such a command. The most common example is the use of \ in amsmath environments. In xparse terms it has signature

\DeclareDocumentCommand \\ { !s !o } { ... }

According to it, \\'s signature should change from !s o to !s !o when it is inside amsmath environments, which is consistent with my initial observation in the first post, and there may be more macros having similar behaviors.

Good find! #41 fixes this issue. I'll release a new version when the tests pass.

I am not sure if the fix is correct. The signature is not globally !s !o, it is only so inside certain amsmath environments. In rest of the cases, it has to be !s o. It seems that the linked PR globally changes the signature. I would say that the previous behavior is closer to the expected behavior.

The current infrastructure doesn't seem to allow signatures to change based on neighboring environment, so I guess it won't be a simple fix.

Yes, currently macros are defined globally. It's possible to let an environment redefine the macros it uses (see the tikz package), but it's annoying. I think most people don't even know that you can do \\ [4pt] in normal LaTeX, so I don't think it's too much of a loss.

It seems that tikz code is pretty specialized to that case. In general, macros are only available to an enclosing group and IMO it is an essential feature that allows basic encapsulation, but unified-latex currently treats every macro as global.
If tikz codes can be adopted to support macro scopes then it would be great.

That's a good point. Would you mind opening a new issue for per-environment macro overrides?

Well, I don't have resource to work on it, so I'd rather not "own" the issue.
It is actually a separate issue, per-group macro overrides and per-environment macro overrides, but perhaps these could be dealt in one go.