onetrueawk / awk

One true awk

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Syntax question -- expression should be an error but seems not to be

raygard opened this issue · comments

(I left this comment under issue #147 but raise it here in case it gets no notice there.)

Why is 'NF==2 && $0=$2' not a syntax error? == has higher precedence than &&, which has higher precedence than =. This should parse as '(NF==2 && $0) = $2', should it not? The left side of that assignment is not an lvalue. What am I missing?
This works in the awk implementations I have on hand, but not in the one I am trying to write. If this is valid, it's a problem for my parser...

In C (similar precedence rules), this works as expected:

if (2 == 2 && (n = 37)) printf("%d\n", n);

but (with gcc) this gets "error: lvalue required as left operand of assignment":

if (2 == 2 && n = 37) printf("%d\n", n);

The answer, as unsatisfying as it may be, is that Awk has always worked this way. I didn't look at this in great detail, but in the grammar for The One True Awk, it seems to be handled in the productions for pattern and ppattern. In the gawk grammar, it's handled in the product for exp using a %prec directive.

You might want to look at the mawk grammar which is likely to be the cleanest of all. This is also likely fodder for a POSIX interpretation request. @benhoyt asked me about this privately a little while back and I gave him the same answer.

Closing the issue. Thanks.

@raygard Yes, I just ran across this in GoAWK and fixed it here: benhoyt/goawk#170 ... GoAWK uses a hand-written recursive descent parser so it this is a bit of a hack, but oh well, at least it's consistent with the other Awks now.

I saw you said you're writing your own Awk version: in what language? Is the source code available? I'm curious now. :-)

Hi @benhoyt, it's in C and will be freely available when (if?) it gets to a point I'm not ashamed for you to see it. It's got a long TODO list and probably still lotsa bugs. I spent waay too long yesterday chasing down a memory leak due to a (as usual) stupid blunder.
I thought my parser was in pretty good shape until this thing happened. Mine is also hand-written recursive descent and I'm not sure how best to fix this. I am trying to be posix-spec compliant, but this case and the others in your own pull request and related issue are just plain not posix-compliant. Even busybox awk parses this the way OTA/gawk/mawk do, so they're bug-compatible and I'm not. I guess I need to be, to call my program an awk.
It's not as ambitious as your impressive GoAWK. I'm not sure how to know if I'm parsing the same as the other implementations. I will study bwk's yacc grammar more carefully, though I'm a newbie with yacc/bison.
p.s. @arnoldrobbins, thank you for your reply. You warned me awk "has a lot of dark corners." I've found a few already, but I had hoped the POSIX grammar matched the reality better. I hope the rest of the POSIX spec is closer.

@raygard Feel free to reach out via my email address on https://benhoyt.com/ if you want, and we can compare notes. I have a large test suite of my own tests in GoAWK, plus I run it against most of the onetrueawk and the Gawk test suites (at least the ones from Gawk that aren't Gawk-specific). Running it against the onetrueawk and Gawk suites really helped me find a lot of bugs/quirks.