pyparsing / pyparsing

Python library for creating PEG parsers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`QuotedString`, `multiline=True` and a newline in the quote sequence

rowlesmr opened this issue · comments

I think I've found a bug in QuotedString where multiline=True and there is a newline in the quote sequence.

The newline is ignored.

ParserElement.set_default_whitespace_chars("")
newlinequote = QuotedString("\n;", multiline=True)
newlinequote.search_string("lsjdf \n;Hi \n mum!\n; sldjf")  # Receive [['Hi \nmum!\n']]. Expect [['Hi \n mum!']]
newlinequote.search_string("lsjdf \n;Hi \n m;um!\n; sldjf")  # Receive [['Hi \n m']]. Expect [['Hi \n m;um!']]
newlinequote.search_string("lsj;df \n;Hi \n m;um!\n; sldjf")  # Receive [['df \n'], ['um!\n']]. Expect [['Hi \n m;um!']]

Is this the desired behaviour?

This is interesting, using a newline as part of the quotes. Needs some further study.

https://www.iucr.org/__data/assets/text_file/0009/112131/CIF2-ENBF.txt

is the grammar I'm trying to implement. My question specifically applies to "text-field".

(* text block *) 
text-field = text-delim, text-content, text-delim ;
text-delim = line-term, ';' ;
text-content = { allchars } - ( { allchars }, text-delim, { allchars } ) ;

I would like to use QuotedString, as it returns the content; I had a devil of a time making that work in another parser I tried.

Edit: I think this is how to directly implement it with the existing grammar constructs you have. allchars comes from here

line_term = (Literal("\r") + Opt("\n")) | Literal("\n")
text_delim = Combine(line_term + Literal(";")) 
text_content = ~(ZeroOrMore(allchars) + text_delim + ZeroOrMore(allchars)) + ZeroOrMore(allchars)
text_field = text_delim + text_content + text_delim