shnewto / bnf

Parse BNF grammar definitions

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Empty String Rules (Still) Fail to Match

nskobelevs opened this issue Β· comments

Describe the bug
Empty string rules still fail to match in some situations. This is the same as #108 although after it's fix it doesn't happen in all situations but still happens.

To Reproduce

  1. Load grammar from tests/fixtures/bnf.bnf

  2. Try to parse the first line of the text:
    "<syntax> ::= <rule> | <rule> <syntax>\n"
    This match will fail.

  3. Try to parse the same line with whitespace before the newline
    "<syntax> ::= <rule> | <rule> <syntax> \n"
    This will parse properly

The first example should pass given <line-end> may begin with <opt-whitespace> which has an empty string rule

<line-end>       ::= <opt-whitespace> <EOL>
<opt-whitespace> ::= "" | " " <opt-whitespace>
<EOL>            ::= "
"

Unfortunately I haven't been able to find a smaller example. Weirdly it does parse "<syntax> ::= <rule>" correctly but not the more complicated example above.

If there support for special characters like \n or a built in definition for <EOL> it could be good to have a test that a grammer made from tests/fixtures/bnf.bnf can parse the file itself.

Desktop (please complete the following information):

  • MacOS Ventura 13.0.1

ah thanks for raising this issue! I really appreciate the details for reproducing!

I'll take a look πŸ˜„

just to share some progress, I started looking into this today, and was able to reduce the bug test case to:

#[test]
fn parse_nested_empty() {
    let grammar: Grammar = "
    <start> ::= <a> <empty>
    <a> ::= 'a' <empty>
    <empty> ::= ''"
        .parse()
        .unwrap();

    let input = "a";

    let parses = parse(&grammar, input);
    assert_eq!(parses.count(), 1);
}

The bug seems to be caused by incorrectly de-duplicating Earley states. Because "empty" rules do not advance the input cursor, different predictions get treated as "duplicates".

I have a fix working locally, but it basically "solves" the problem by duplicating all parse attempts πŸ˜…

Next chance I get, I will brainstorm for a more realistic solution

Any progress on a fix?

I may be hitting the same issue – here's another bare-bones example you can check against:

#[test]
fn test() {
    let grammar: Grammar = "
    <balanced> ::= <left> <balanced> <right>
                 | ''
    
    <left> ::= <opt-ws> '(' <opt-ws>
    <right> ::= <opt-ws> ')' <opt-ws>
    
    <opt-ws> ::= '' | <ws>
    <ws> ::= ' ' | ' ' <ws>
    "
    .parse()
    .unwrap();

    let input = "()";

    assert!(
        grammar.parse_input(input).next().is_some(),
        "can't parse: {input}"
    );
}

Greatly appreciate everybody's work so far!

Would love to use bnf as I don't see an established Rust library that supports Earley parsing.

However, this issue is a blocker for my use case πŸ™

@CrockAgile in #92 you reference this tutorial, which has a section on empty rules: https://loup-vaillant.fr/tutorials/earley-parsing/empty-rules – maybe there's something helpful there, where it discusses Aycock and Horspool's (2002) paper?

If you get any insights from fixing this issue, it might be worth including the findings in your previous write-up (thanks again for that): https://gist.github.com/CrockAgile/d8cb79607b6c6c49c06a97bd652dc219

Either way, we are in good company – as mentioned by MARPA's author:

Earley's original 1970 algorithm actually had a bug in its handling of nullables. There was an easy fix, but it made the algorithm slower. Since efficiency was already seen as the reason to prefer other parsers, this was a big deal.

Thank you for another example! I do have a working fix, based on the exact solution you linked. So glad we are on the same page.

But while implementing it, a surprising amount of refactoring was needed (allowing for Earley matches on "nullable productions", not just complete "States", if you are curious πŸ˜… )

That refactor introduced a significant performance regression 😱

But I fixed that last night! So

TLDR: I should have it up for review tomorrow 🀞

this fix should now be available under v0.4.3

but this was my first time publishing this crate, so let me know if something is broken πŸ˜…

Looks good here - thank you for addressing this!