CatalaLang / catala

Programming language for literate programming law specification

Home Page:https://catala-lang.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Explicitly choose a Markdown flavor ? (maybe commonmark?)

rprimet opened this issue · comments

For now I think the flavor of Markdown that Catala uses for its source files is defined as Pandoc's.

Which is fine, but maybe could be better, because pandoc has some limitations, for instance, the AST does not include source file information (spans) in elements if I'm not mistaken.

Should we adopt Commonmark as an official flavor of Markdown? (as far as I can tell, this is what Pandoc uses under the hood).

Some Commonmark parsers seem to surface source file info in the AST (for instance https://github.com/executablebooks/markdown-it-py seems to do it).

Actually, subsequent question, can the catala compiler itself emit the relevant AST (including markdown-related info) in JSON ?

I'm in favor of adopting CommonMark rigourously. Right now I didn't pin an exact markdown flavor when implementing the compiler.

Currently the Catala compiler "parses" the markdown but only very lightly. It is only concerned with the header structure, but does not parse links, tables, etc. It uses the header structure to augment source code position information. So the Catala compiler cannot produce a fully-fledged Markdown AST, and can't output it in JSON. I don't think this will change in the future: I don't want us to reimplement a Markdown parser in OCaml in the Catala compiler.

This is the reason why I think catala-devtools-fr should re-parse the Markdown on its own, without taking a dependance on the Catala compiler. The goal of catala-devtools-fr is to handle all things law-related and ignore the Catala code, while the compiler handles all things code-related while ignoring the text.

With this division of labor in mind, I think that in the futur the literate programming tools that are currently in the Catala compiler could be moved and reimplemented in catala-devtools-fr. Specifically, the two features that right now produce HTML and LaTeX from the Catala source code files.

Yep, needs some pondering as we'll probably need extensions to strict commonmark (at least tables?).

Note that it would be possible for the Catala compiler to leverage e.g. Cmarkit for handling of CommonMark (I trust the author of this lib to be strict about the specs). This way the effort to integrate it wouldn't be too high. Their mention of available extensions may be relevant here:

a non-strict parsing mode can be activated to add: strikethrough, LaTeX math, footnotes, task items and tables.

Wouahou to each problem, there's a Bünzli library for it :) Yes should use Cmarkit, at least for parsing markdown headers and code blocks. For the rest I prefer relying on pandoc as we do today to generate LaTeX and HTML because it supports more features.