orangeduck / mpc

A Parser Combinator library for C

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

* inside of + or * inside of * causes infinite loop

mgood7123 opened this issue · comments

maths.c

#include "../mpc.h"
#include "../mpc.c"

int main(int argc, char **argv) {
  
  mpc_parser_t *plus  = mpc_new("plus");
  mpc_parser_t *star  = mpc_new("star");
  
  mpca_lang(MPCA_LANG_PREDICTIVE,
    "plus : <star>+;"
	"star : 'a'*;",
    plus, star, NULL);
  
  mpc_print(plus);
  mpc_print(star);
  
  if (argc > 1) {
    
    mpc_result_t r;
    if (mpc_parse_contents(argv[1], plus, &r)) {
      mpc_ast_print(r.output);
      mpc_ast_delete(r.output);
    } else {
      mpc_err_print(r.error);
      mpc_err_delete(r.error);
    }
    
  } else {

    mpc_result_t r;
    if (mpc_parse_pipe("<stdin>", stdin, plus, &r)) {
      mpc_ast_print(r.output);
      mpc_ast_delete(r.output);
    } else {
      mpc_err_print(r.error);
      mpc_err_delete(r.error);
    }
  
  }

  mpc_cleanup(2, plus, star);
  
  return 0;
  
}

$ gcc maths.c -o maths && time echo aaaa | timeout 5 ./maths
(<S> <star>)
(<S> ('a' whitespace))*
> 
  star|> 
    char:1:1 'a'
    char:1:2 'a'
    char:1:3 'a'
    char:1:4 'a'

real    0m0.009s
user    0m0.002s
sys     0m0.005s
$ gcc maths.c -o maths && time echo aaaa | timeout 5 ./maths
(<S> <star>)+
(<S> ('a' whitespace))*

real    0m5.007s
user    0m2.978s
sys     0m1.460s
$ gcc maths.c -o maths && time echo aaaa | timeout 5 ./maths
(<S> <star>)*
(<S> ('a' whitespace))*

real    0m5.008s
user    0m3.091s
sys     0m1.353s
$ gcc maths.c -o maths && time echo aaaa | timeout 5 ./maths
(<S> <star>)
(<S> ('a' whitespace))+
> 
  star|> 
    char:1:1 'a'
    char:1:2 'a'
    char:1:3 'a'
    char:1:4 'a'

real    0m0.010s
user    0m0.005s
sys     0m0.001s
$ gcc maths.c -o maths && time echo aaaa | timeout 5 ./maths
(<S> <star>)+
(<S> ('a' whitespace))+
> 
  star|> 
    char:1:1 'a'
    char:1:2 'a'
    char:1:3 'a'
    char:1:4 'a'

real    0m0.006s
user    0m0.004s
sys     0m0.001s
$ gcc maths.c -o maths && time echo aaaa | timeout 5 ./maths
(<S> <star>)*
(<S> ('a' whitespace))+
> 
  star|> 
    char:1:1 'a'
    char:1:2 'a'
    char:1:3 'a'
    char:1:4 'a'

real    0m0.006s
user    0m0.003s
sys     0m0.002s

What is the expected behavior in this case? It should be obvious that if you have a rule which is one or more, and then you ask for zero or more of this rule it is the same as asking for zero or more. Similar reasoning goes for the other way around.

You need to be very careful with * because it can easily match the empty string. And unless you add some failure case it can match an infinite number of empty strings.

Also can you please add some description to your issues when you open then and double check yourself that they are definitely a bug before you open them. You should at least add a description about what you encountered, what you expected to happen, and why you think the behavior is wrong. Every time you open an issue I have to read through a bunch of random crap just to try and work out exactly what you are getting at - you can at least make some kind of nominal effort on your side if you expect me to investigate these issues.

with one or mire and zero or more, and with zero or more and zero or more i expect the following behavour

for

(<S> <star>)*
(<S> ('a' whitespace))*

i expect it to act as if it was grouped into a single zero or more statement like

(<S> <star>)*
(<S> ('a' whitespace))

or

(<S> <star>)*
(<S> ('a' whitespace))+

and vice versa with one or more and zero or more

(<S> <star>)+
(<S> ('a' whitespace))*

should be optimized to

(<S> <star>)+
(<S> ('a' whitespace))

or

(<S> <star>)+
(<S> ('a' whitespace))+

to negate the infinite recursion due to * counting as a valid match for + or *

@orangeduck I know it's a bit late, but I've been fooling around with the different scenarios in which it times out and when it doesn't and I'd like to know if my understanding of the implementation is correct.

For plus: <star>+ and 'a'*, the 'one or more' for star would necessitate one or more of star since plus is effectively a pointer to it

plus: <star>+ and star: 'a'+ works because when the program runs, it starts with plus, matches with star, and then from that point keeps pinging star until there are no 'a's left in the prompt and then returns.

plus: <star>* and star: 'a'* doesn't work because 'zero or more' will match with any condition, it doesn't have an alternate exit condition aside from 'no longer matching the prompt' and the program has no way of knowing how many times plus should refer to star

Am I on the right track?

Thanks @AmyShackles - I think the problem here is not actually the plus or star but the fact that the grammar is missing an "end of input" match. The danger with star is that it can match the empty string - so if you combine it with another combinator (such as plus) you can easily match the empty string infinitely without ever failing. There always needs to be a failure condition when you use star otherwise it will run forever because it will greedily match the empty string.

While some regex engines do clever things to avoid infinite loops this isn't so simple in mpc because star can go after any parser rule which parse all sorts of things not just strings.

Does that make things any clearer?

@orangeduck That makes perfect sense! So the reason it handles okay with plus: <star>* combined with star: 'a'+ is that it's parsing character by character and so when there are no longer 'a's to process, the fact that there are 1 or more 'a's is no longer true, so it terminates?

Yes exactly, either way I can try to add some test cases for these rules in the future but I believe the behavior is correct at least for now.