goodmami / pe

Fastest general-purpose parsing library for Python with a familiar API

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

"Sidecar" objects for accumulative parsing

goodmami opened this issue · comments

This is a feature where an object is created as parsing begins and it can be used in actions during parsing.

Imagine parsing something like a large TOML file, where the top-level object is not created until the full file has been read in. You won't know if there's a table or key collision until the very end. Instead, if we could create an object that assembles the document as it parses, we could check as we go. E.g.:

from pe.actions import Sidecar
...

class TOMLSidecar:
    def __init__(self):
        self.doc = {}
    def register_table(self, name: str) -> None:
        if name in doc:  # simplified for convenience
            raise TOMLDecodeError(f"table already defined: {name}")
        self.doc[name] = {}
    ...

TOML_GRAMMAR = """
...
Table <- "[" WS Key WS "]"
...
"""

toml_parser = pe.compile(
    TOML_GRAMMAR,
    actions={
        "Table": Sidecar.call("register_table"),
        ...
    },
    sidecar_factory=TOMLSidecar,
)

The Sidecar.call("register_table") does something like this at parse time:

getattr(self.sidecar, "register_table")(*args, **kwargs)

This would probably subsume the use case of #10.

Some notes:

  1. The mutability of the sidecar object means that it might be changed by parsing paths that ultimately fail.
  2. It's not an Action object even though it is used in actions={...}, so maybe it doesn't belong in pe.actions
  3. The "sidecar" name is not certain. It's not the same as the Sidecar Pattern for applications. Alternatives:
    • "Parse-time object" is a bit long
    • "Proxy object"?