goodmami / pe

Fastest general-purpose parsing library for Python with a familiar API

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bug: Newlines make the debug output difficult to read.

TomHodson opened this issue · comments

commented

When debugging multiline strings the newlines mess up the output. Here's the most minimal non-trivial grammar I could come up with.

import pe
from pe.actions import Capture

multiline_parser = pe.compile(
    r'''
    File    <- Spacing Integer (Spacing Integer)* Spacing EOF
    Integer  <- "-"? ("0" / [1-9] [0-9]*)
    Spacing  <- [\t\n\f\r ]*
    EOF      <- !.
    ''',
    actions={
        'Integer': Capture(int),
    },
    flags=pe.DEBUG,
)


test = """
1
2
"""

multiline_parser.match(test).groups()

This first few lines of this looks like:

## Grammar ##
File    <- Spacing Integer (Spacing Integer)* Spacing EOF
Integer <- "-"? ("0" / [1-9] [0-9]*)  -> Capture(<class 'int'>)
Spacing <- [\t\n\f\r ]*
EOF     <- !.

1
2
        |                         Spacing Integer (Spacing Integer)* Spacing EOF

1
2
        |                           Spacing

1
2
        |                             [\t\n\f\r ]*

1
2
        |                               [\t\n\f\r ]
1
2
         |                               [\t\n\f\r ]

With the change suggested in #30 this would change to:

## Grammar ##
File    <- Spacing Integer (Spacing Integer)* Spacing EOF
Integer <- "-"? ("0" / [1-9] [0-9]*)  -> Capture(<class 'int'>)
Spacing <- [\t\n\f\r ]*
EOF     <- !.
\n1\n2\n     |                         Spacing Integer (Spacing Integer)* Spacing EOF
\n1\n2\n     |                           Spacing
\n1\n2\n     |                             [\t\n\f\r ]*
\n1\n2\n     |                               [\t\n\f\r ]
1\n2\n       |                               [\t\n\f\r ]
1\n2\n       |                           Integer
1\n2\n       |                             "-"? ("0" / [1-9] [0-9]*)  -> Capture(<class 'int'>)
1\n2\n       |                               "-"? ("0" / [1-9] [0-9]*)
1\n2\n       |                                 "-"?
1\n2\n       |                                   "-"
1\n2\n       |                                 "0" / [1-9] [0-9]*
1\n2\n       |                                   "0"
1\n2\n       |                                   [1-9] [0-9]*
1\n2\n       |                                     [1-9]
\n2\n        |                                     [0-9]*
\n2\n        |                                       [0-9]
\n2\n        |                           (Spacing Integer)*

Thanks! I've now recreated the issue. Here's a smaller example:

import pe
p = pe.compile("[0-9] [ \t\n]* [0-9]", flags=pe.DEBUG)
p.match("1\n2")

(and due to some confusion in the use of compile-time vs parse-time flags, just doing pe.match(..., flags=pe.DEBUG) does not work; this may be a separate issue)

The output of the above is:

## Grammar ##
Start <- [0-9] [ \t\n]* [0-9]

1
2          |    [0-9] [ \t\n]* [0-9]
1
2          |      [0-9]

2           |      [ \t\n]*

2           |        [ \t\n]
2            |        [ \t\n]
2            |      [0-9]
<Match object; span=(0, 3), match='1 2'>

And I see that the problem is not the printout of the grammar rules but with the context on the left. I agree that this makes the output hard to read.