shnewto / bnf

Parse BNF grammar definitions

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Nonterminal identifiers disallow whitespace in their name

shnewto opened this issue · comments

At the moment, identifying a nonterminal rule silently disallows whitespace, i.e. my name in <my name> will parse as a nonterminal identified as my. This is certainly not ideal. I do wonder whether it's reasonable to disallow whitespace by raising an error in cases like the above or if we should just handle and allow it. Probably we need to review the standard for BNF in this case and in general to ensure compliance.

I don't think BNF allows for whitespace in the nonterminals, but then again I am not sure there is a formal specification. The wikipedia article does not allow whitespace and encourages the alternative hyphen, <my-name>. I could accept either way if supporting whitespace does not massively complicate parsing.

I'll do some more digging and see if there is any consensus that addresses it (preliminary digging indicates maybe not?) but probably adding logic to throw an error on nonterminal identifier whitespace would be as much work as just allowing it and at least one of them does need to happen.

@CrockAgile A change that was made to accomplish #14 that "kind of" addressed this one, would like your opinion on that. Before, the real bug was that <my name> resulted in a nonterminal identified as my. After #14 everything between the < and the corresponding > is wrapped into the nonterminal identifier. That also means that < somename > results in the identifier somename which seems weird. My question is do think we should leave it there / don't make any guesses, or do we trim \n\r\t from either side of the nonterminal identiffer? It's pretty trivial to trim, just interested in another opinion.

@Snewt trimming doesn't seem necessary to me. It does make a weird term like you said, but as long as they behave the same as the rest, I don't feel any urge to trim them. Would change my mind if adding extra whitespace became a common accident for users. But I like the simplicity of "whatever is between the < and > is what you get".