Indentation based languages
mathrb opened this issue · comments
Hello
I know pure PEG cannot parse an indentation based language.
But like pegjs, Arpeggio might have a way to do it.
Do you thinks it's feasible? If yes, could you give me some hints?
Kind regards
Hi
Yes, I think it is feasible but never got a change to put some time into it. There is some discussion on the textX issue.
There are two task/issues to be decided/solved:
- How to track indentation level during recursive descend.
- How to specify indentation language rules in the grammar.
For the second point what comes to mind is either to introduce a new Match
subclasses Indent
and Dedent
and insert it at the place in the grammar where increasing/decreasing of indentation level is expected. Another way to deal with it is to introduce a parsing expression Indented
which wraps other expression that are expected to be at the higher indentation level. Everything should be backward compatible regarding whitespace skipping. Also, what is considered an indentation increase (tabs/spaces, how many?) should be configurable.
Thanks for your answer @igordejanovic
The grammar I have highly depends on tracking indentation, so it might be complicated to implement.
Thanks
FYI the PyParsing project recently introduced an IndentedBlock class to handle this (previously they used a helper method). Code and docs (same MIT license) at pyparsing/pyparsing@2dd2e2b
Is there a plan to implement such a feature?
I don't have resources to work on this but I would be glad to help out in discussing the design and reviewing the implementation if anyone has time and will to work on it.