pyparsing / pyparsing

Python library for creating PEG parsers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Misleading debug text when encountering `\r`

rowlesmr opened this issue · comments

Having a \r in a string which is being parsed resets the output string in the debug output, overwriting what was already there.

The parsing is correct, just the explanatory text is wrong.

\r is imprtant as a standalone character, as I need to be able to accept it as a line terminator.

from pyparsing import (
    Opt,
    ParserElement, 
    Regex
)

if __name__ == "__main__":

    ParserElement.set_default_whitespace_chars(" \t")
    debug = True

    line_term = (("\r" + Opt("\n")) | "\n").set_debug(flag=debug).set_name("line_term")
    comment = (Regex("#.*(?=(\r\n?)|\n)") + line_term).set_debug(flag=debug).set_name("comment")
    string = (Regex("[a-z0-9]+") + Opt(line_term)).set_debug(flag=debug).set_name("string")
    value = (string | comment).set_debug(flag=debug).set_name("value")
    file = (value[...] + line_term[...]).set_debug(flag=debug)

    s="""#multi word comment \nval val2 \r val3\nval4  \t\n\n\r"""
    print(f"{file.parse_string(s, parse_all=True)=}")

results in (in part):

#... more stuff before
Match line_term at loc 20(1,21)
  #multi word comment 
                      ^
Matched line_term -> ['\n']
Matched comment -> ['#multi word comment ', '\n']
Matched value -> ['#multi word comment ', '\n']
Match value at loc 21(2,1)
 val3
  ^
Match string at loc 21(2,1)
 val3
  ^
Match line_term at loc 24(2,4)
 val3
     ^
Match line_term failed, ParseException raised: Expected '\r', found 'val2'  (at char 25), (line:2, col:5)
Matched string -> ['val']
Matched value -> ['val']
Match value at loc 24(2,4)
 val3
     ^
Match string at loc 25(2,5)
 val3
      ^
Match line_term at loc 29(2,9)
 val3
          ^
Matched line_term -> ['\r']
Matched string -> ['val2', '\r']
Matched value -> ['val2', '\r']
#... more stuff after

Interesting issue. Could you also please supply a small sample string I can use for s? Probably past the repr of the string so that the control characters show up properly.

One string:

s="""#multi word comment \nval val2 \r val3\nval4 \t\n\n\r"""