CatalaLang / catala

Programming language for literate programming law specification

Home Page:https://catala-lang.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pygments lexer breaks LaTeX escapeinside

pierregoutagny opened this issue · comments

I'm trying to use the Pygments lexer in syntax_highlighting/en/pygments.
With Pygments version 2.16.1 (I haven't tried other versions, but this is close to being the latest), running pygmentize -l 'catala_en' -f latex -P escapeinside='!!' example.catala_en on the following file:

```catala
declaration scope A:
  input x content integer!\label{line:x}!
```

renders the following:

\begin{Verbatim}[commandchars=\\\{\},codes={\catcode`\$=3\catcode`\^=7\catcode`\_=8\relax}]
\PY{l+s}{```catala}
\PY{k+kr}{declaration}\PY{l+s}{ }\PY{k+kr}{scope}\PY{l+s}{ }\PY{n+nc}{A}\PY{o}{:}
\PY{l+s}{ }\PY{l+s}{ }\PY{k+kd}{input}\PY{l+s}{ }\PY{n+nv}{x}\PY{l+s}{ }\PY{k+kr}{content}\PY{l+s}{ }\PY{k+kt}{integer}\PY{err}{!}\PY{l+s}{\PYZbs{}}\PY{k+kr}{label}\PY{o}{\PYZob{}}\PY{n+nv}{line}\PY{o}{:}\PY{n+nv}{x}\PY{o}{\PYZcb{}}\PY{err}{!}
\PY{l+s}{```}
\end{Verbatim}

The important part here being that the escaped LaTeX code is rendered (line 4) as \PY{err}{!}\PY{l+s}{\PYZbs{}}\PY{k+kr}{label}\PY{o}{\PYZob{}}\PY{n+nv}{line}\PY{o}{:}\PY{n+nv}{x}\PY{o}{\PYZcb{}}\PY{err}{!}. Instead, this should be \PY{esc}{\label{line:x}} so that it is indeed escaped when using eg the minted LaTeX package.

Given that Pygment's doc states that the escapeinside option has "no effect in string literals", I would suspect that some uses of the String token in lexer.py may be responsible for this behavior.

I can reproduce in Pygments version 2.14.0. Looking back at previous papers, it seems escapeinside has been used in the ICFP paper, but I think the pygmentize scripts have been changed since then.

Even for the ICFP paper having the !\label{line:x}! work with the Catala pygments lexer was a huge pain. I wouldn't know how to fix that now that our pygments workflow has changed (@AltGr), I suggest as a workaround to simply hardcode the line number you want to refer to in the paper...

Yes, that's the current workaround. For text documents that not too bad, but for beamer it's nice to be able to insert tikzmarks

Following my last remark on the String token being a possible culprit, I tried simply replacing it with Text everywhere in the lexer, and it seems to work.
I don't know if it breaks other things or if the visual result is exactly what was expected (for example I think ‌```catala does not have the same color), but it is enough in my environment for now, and less painful than writing numbers by hand, for the small price of a text substitution.
I haven't tried using this in beamers, but I would expect it to work.