Cannot change default comment regexp
RobertBaruch opened this issue · comments
See also #249. Summary: Either #249 was not actually fixed, or the documentation on how to specify comment regexps (docs/syntax.rst
) is incorrect. Note also that the workaround in #249 still fixes this issue.
Tested using pip install tatsu
(v 5.8.3) and Python 3.10.6.
Test grammar (comments.peg
)
file::File = lines:{line}+ $ ;
line::Line = comment:comment | comment2:comment2 | blank:blank ;
comment::Comment = content:COMMENT ;
comment2::Comment2 = content:COMMENT2 ;
blank::Blank = content:NEWLINE ;
NEWLINE = '\n' ;
COMMENT = /#[^\n]*\n/ ;
COMMENT2 = /%[^\n]*\n/ ;
Main:
import tatsu
from tatsu.model import ModelBuilderSemantics
import json
def main():
with open('comments.peg') as f:
txt = f.read()
parser = tatsu.compile(txt, semantics=ModelBuilderSemantics(), comments_re=None, eol_comments_re=None)
with open('test.peg') as f:
txt = f.read()
model = parser.parse(txt, whitespace='', comments_re=None, eol_comments_re=None)
print(json.dumps(model.asjson(), indent=4))
if __name__ == "__main__":
main()
Test file (comments.peg
):
# comment here
% different comment
# another comment
Resulting output:
{
"__class__": "File",
"lines": [
{
"__class__": "Line",
"blank": {
"__class__": "Blank",
"content": "\n"
}
},
{
"__class__": "Line",
"blank": {
"__class__": "Blank",
"content": "\n"
}
},
{
"__class__": "Line",
"comment2": {
"__class__": "Comment2",
"content": "% different comment\n"
}
},
{
"__class__": "Line",
"blank": {
"__class__": "Blank",
"content": "\n"
}
},
{
"__class__": "Line",
"blank": {
"__class__": "Blank",
"content": "\n"
}
}
]
}
Expected output:
{
"__class__": "File",
"lines": [
{
"__class__": "Line",
"comment": { <<<<<------------
"__class__": "Comment",
"content": "# comment here\n"
}
},
{
"__class__": "Line",
"blank": {
"__class__": "Blank",
"content": "\n"
}
},
{
"__class__": "Line",
"comment2": {
"__class__": "Comment2",
"content": "% different comment\n"
}
},
{
"__class__": "Line",
"blank": {
"__class__": "Blank",
"content": "\n"
}
},
{
"__class__": "Line",
"comment": { <<<<<------------
"__class__": "Comment",
"content": "# another comment\n"
}
}
]
}```
In the original grammar you posted you're taking care of parsing comments, and not using TatSu facilities. It's likely that the regular expressions used in the grammar are not correct.
These are the kind of queries that should be posted on StackOverflow under the tatsu tag.