Handling arbitrary lookaheads in PLY
RollerMatic opened this issue · comments
I am trying to parse a configuration file through PLY, and i have been having issues with parsing that I will try to explain in detail below.
For context, the syntax of the configuration is going to look like this
<start of block>
# block one
#more comments
DEFAULT x=* "" <-- request stmt
y := 1 <-- reply stmt
# block two
DEFAULT a !* 0
a := b,
c = d
# random comment
$INCLUDE <some string> <- include statement
<end of block>
The marking of starting and ending of a config block is implicit. Optional comments followed by a username (DEFAULT here) marks start of a block. The last include stmt signifies end of a config block. A config like this will contain multiple config blocks.
The goal here is to to accomodate that comments could be present before any major component(username, request_stmts, reply_stmts, include_stmts)
The grammar that I have been trying to parse this config with is this :
statements : statements statement
statements : statement
statement: comments username request_stmts reply_stmts include_stmts
| comments username request_stmts reply_stmts
| comments username assign_stmt
| username request_stmts reply_stmts include_stmts
| username request_stmts reply_stmts
| username assign_stmt
request_stmts : request_stmts request_stmt
request_stmts : request_stmt
request_stmt : assign_stmt
reply_stmts : reply_stmts reply_stmt
reply_stmts : reply_stmt
reply_stmt : INDENT assign_stmt
assign_stmt : term COLON_EQUALS term
| term NOT_STAR term
| term EQUALS term
| term EQUAL_TILDE term
| term EQUALS_STAR term
| term DOUBLE_EQUALS term
| term NOT_EQUALS term
| term PLUS_EQUALS term
include_stmts : include_stmts include_stmt
include_stmts : include_stmt
include_stmt : comment INCLUDES term
| INCLUDES term
comments : comments comment
comments : comment
comment : COMMENT
username : START_OF_LINE
term : STRING
| DOUBLE_QUOTED_STRING
Running this grammar as a parser gives me shift/reduce conflicts , which I do expect because I am unable to specify that comments before a username should signify a new block but comments before anything else do not signify a new block.
I have three questions here:
- Is there a fundamental problem in how I am looking at this parsing ?
- Is there a way to correct something in my current approach to achieve what I want ?
- Is there a better way to achieve what I want ?