dabeaz / ply

Python Lex-Yacc

Home Page:http://www.dabeaz.com/ply/index.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Handling arbitrary lookaheads in PLY

RollerMatic opened this issue · comments

I am trying to parse a configuration file through PLY, and i have been having issues with parsing that I will try to explain in detail below.
For context, the syntax of the configuration is going to look like this

<start of block>
# block one
#more comments
DEFAULT x=* "" <-- request stmt
   y := 1 <-- reply stmt

# block two
DEFAULT a !* 0
   a  := b,
   c  = d

# random comment
$INCLUDE <some string> <- include statement
<end of block>

The marking of starting and ending of a config block is implicit. Optional comments followed by a username (DEFAULT here) marks start of a block. The last include stmt signifies end of a config block. A config like this will contain multiple config blocks.
The goal here is to to accomodate that comments could be present before any major component(username, request_stmts, reply_stmts, include_stmts)
The grammar that I have been trying to parse this config with is this :

statements : statements statement
statements : statement
statement: comments username request_stmts reply_stmts include_stmts
                 | comments username request_stmts reply_stmts
                 | comments username assign_stmt
                 | username request_stmts reply_stmts include_stmts
                 | username request_stmts reply_stmts
                 | username assign_stmt
request_stmts : request_stmts request_stmt
request_stmts : request_stmt
request_stmt : assign_stmt
reply_stmts : reply_stmts reply_stmt
reply_stmts : reply_stmt
reply_stmt : INDENT assign_stmt
assign_stmt : term COLON_EQUALS term
                    | term NOT_STAR term
                    | term EQUALS term
                    | term EQUAL_TILDE term
                    | term EQUALS_STAR term
                    | term DOUBLE_EQUALS term
                    | term NOT_EQUALS term
                    | term PLUS_EQUALS term
include_stmts : include_stmts include_stmt
include_stmts : include_stmt
include_stmt : comment INCLUDES term
                     | INCLUDES term
comments : comments comment
comments : comment
comment : COMMENT
username : START_OF_LINE
term : STRING
            | DOUBLE_QUOTED_STRING

Running this grammar as a parser gives me shift/reduce conflicts , which I do expect because I am unable to specify that comments before a username should signify a new block but comments before anything else do not signify a new block.

I have three questions here:

  1. Is there a fundamental problem in how I am looking at this parsing ?
  2. Is there a way to correct something in my current approach to achieve what I want ?
  3. Is there a better way to achieve what I want ?