This repository provides different tools to work on the error messages of a menhir-generated parser.
The main tool is lrgrep
. It takes:
- a compiled Menhir grammar (a .cmly file, produced by passing
--cmly
flag to Menhir) - a list of rules (usually a .mlyl file).
If the list of rule is well-formed, it produces an OCaml module that can match the rules against the state of a parser at runtime.
By carefully crafting the rules, one can provide fine-grained message to explain syntax errors.
The repository is is structured as follow:
- the main tool, lrgrep, can be found in src/main.ml
- support implements the compact table representation shared by the generator and the generated analysers via the
lrgrep.runtime
library - in ocaml, we try to apply this methodology to OCaml grammar:
- parser_raw.mly and lexer_raw.mll define an OCaml 4.13 compatible grammar with syntax error reporting removed
- parse_errors.mlyl define the error rules for this grammar
- the frontend binary is an alternative parser that can be used with ocamlc/ocamlopt 4.13 (using the
-pp <path-to-frontend.exe>
flag) - the interpreter binary is a tool that prints detailed information on the parsing process, useful to craft good error rules
- lib implements various algorithms used by other tools
For now, the main focus is on the ocaml sub-directory, and ocaml/parse_errors.mlyl specifically. My current workflow is as follow:
- starts from an example, an OCaml code with a syntax error for which the message is quite bad
- by reading the grammar and the output of the interpreter, get an idea of what the parsing situation looks like around the error point
- craft an error rule, and debug it using by passing
-pp frontend
toocamlc
All the work is done using OCaml 4.13. Make sure you are using the right switch:
$ ocamlc -version
4.13.1
Clone the repository and install dependencies:
$ git clone https://github.com/let-def/lrgrep.git
$ cd lrgrep
$ opam install menhir fix cmon
At this point, make
should succeed (contact me if not) and produce the three binaries: lrgrep.exe
, frontend.bc
and interpreter.exe
.
It is usually better to test with the bytecode frontend as it leads to shorter iteration cycles.
Try the new frontend with some simple examples:
$ ocamlc -c -pp _build/default/ocaml/frontend.bc test_ok.ml
This first example compiled successfully.
$ ocamlc -c -pp _build/default/ocaml/frontend.bc test_ko_01.ml
ocamlc -pp _build/default/ocaml/frontend.bc test_ko_01.ml
File "test_ko_01.ml", line 4, characters 0-3:
4 | let z = 7
^^^
Error: Spurious semi-colon at 2:9
File "test_ko_01.ml", line 1:
Error: Error while running external preprocessor
Command line: _build/default/ocaml/frontend.bc 'test_ko_01.ml' > /tmp/ocamlppbbc3f9
In this one however, there is a syntax error. Luckily, this case is covered by a rule: while the error happens on line 4, it is likely caused by the semi-colon at the end of line 2.
By using the OCAMLPARAM
environment variable, we can instruct all execution of ocaml compilers in the current shell to use our frontend.
$ ./setup_shell.sh
export 'OCAMLPARAM=pp=$PWD/lrgrep/_build/default/ocaml/frontend.bc,_'
# setup_shell commands produces a suitable OCAMLPARAM value
$ eval `./setup_shell.sh`
$ ocamlc test_ko_01.ml
...
Error: Spurious semi-colon at 2:9
...
# In the updated environment, the new frontend is picked up automatically
Now you are ready to iterate on ocaml/parse_errors.mlyl to produce new rules.
Note: unset OCAMLPARAM
to switch back to the normal frontend
Once you made sure your setup is working (make
is (re-)building the frontend and ocamlc
is using it), you can proceed to DEVISING-RULES.md to get started with the error DSL and the associated workflow.
I am trying to document the code. Each of the src, lib, ocaml, and support directories contain a README.md that briefly explains the purpose of this directory.
External dependencies that are worth knowing:
- MenhirSdk is a part of the Menhir parser generator that allows external tool to post-process compiled grammars
- Cmon is a pretty-printer for recursive values
- Fix is a library for computing fixed points; it also provides a convenient representation of finite sets
- LRijkstra is taken from Menhir and implements the algorithm described in "Faster Reachability Analysis for LR(1) Parsers", though we apply it for a slightly different purpose than the one described in the articles
I have noted some urgent and less urgent things to improve in FUTURE.md.