`QuotedString`, `multiline=True` and a newline in the quote sequence
rowlesmr opened this issue · comments
I think I've found a bug in QuotedString
where multiline=True
and there is a newline in the quote sequence.
The newline is ignored.
ParserElement.set_default_whitespace_chars("")
newlinequote = QuotedString("\n;", multiline=True)
newlinequote.search_string("lsjdf \n;Hi \n mum!\n; sldjf") # Receive [['Hi \nmum!\n']]. Expect [['Hi \n mum!']]
newlinequote.search_string("lsjdf \n;Hi \n m;um!\n; sldjf") # Receive [['Hi \n m']]. Expect [['Hi \n m;um!']]
newlinequote.search_string("lsj;df \n;Hi \n m;um!\n; sldjf") # Receive [['df \n'], ['um!\n']]. Expect [['Hi \n m;um!']]
Is this the desired behaviour?
This is interesting, using a newline as part of the quotes. Needs some further study.
https://www.iucr.org/__data/assets/text_file/0009/112131/CIF2-ENBF.txt
is the grammar I'm trying to implement. My question specifically applies to "text-field".
(* text block *)
text-field = text-delim, text-content, text-delim ;
text-delim = line-term, ';' ;
text-content = { allchars } - ( { allchars }, text-delim, { allchars } ) ;
I would like to use QuotedString
, as it returns the content; I had a devil of a time making that work in another parser I tried.
Edit: I think this is how to directly implement it with the existing grammar constructs you have. allchars
comes from here
line_term = (Literal("\r") + Opt("\n")) | Literal("\n")
text_delim = Combine(line_term + Literal(";"))
text_content = ~(ZeroOrMore(allchars) + text_delim + ZeroOrMore(allchars)) + ZeroOrMore(allchars)
text_field = text_delim + text_content + text_delim