timtadh / lexmachine

Lex machinary for go.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Suffixes not minimized

deckarep opened this issue · comments

Hello, thanks for your wonderful library. Your blog posts and library have been great learning resources.

I do have a question which I'm unsure is expected behavior or not.

When a DFA is generated, I noticed there is logic to minimize the DFA for those patterns that are repeated:

Given the following:

"Bob"
"Bobby"
"Bobola"

This will in fact get minimized because the prefix "Bob" is the same for all variances.

My question however is with suffixes?

Given the following:

"tia"
"mia"

No minimization occurs when it seems like it would at least be possible to do.

Here is the output for the compiled expression and DFA.

(*frontend.AltMatch)(0xc00000a5c0)((AltMatch (Match (Concat (Concat (Character t), (Character i), (Character a)), (EOS))), (Match (Concat (Concat (Character m), (Character i), (Character a)), (EOS)))))
start: 7
accepting:
    0 {3}
    1 {6}
0
1 "i"->2
2 "a"->3
3
4 "i"->5
5 "a"->6
6
7 "m"->4, "t"->1

I'm curious to know and understand if such a DFA can still be minimized?

Hi @deckarep ,

I don't think that you can do the minimization that you suggest using the current algorithm (Algorithm 3.39: Minimizing the Number of States in a DFA, "Dragon Book", Compilers: Principles, Techniques and Tools. 2nd Edition. Aho, et al.).

The algorithm works by grouping non-distinguishable states. my guess is states 1 and 4 are distinguishable because there is a different transition character in (namely m vs t).

There could also be a bug but I am not sure.

Ok, yes I wasn't sure if it was a bug or or a general limitation as I'm entirely new to this domain. Thanks for getting back to me I'll do some more research but feel free to close this as it was more of a curious question and not an outright issue for me.

Ok sounds good.