benhoyt / goawk

A POSIX-compliant AWK interpreter written in Go, with CSV support

Home Page:https://benhoyt.com/writings/goawk/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parsing "expr | getline > 0" fails without extra parens

benhoyt opened this issue · comments

When parsing a getline expression like expr | getline > 0, GoAWK fails to parse it where other AWKs succeed. For example:

$ goawk 'BEGIN {while ("ls README*" | getline > 0) print}'
<cmdline>:1:38: expected ) instead of >
BEGIN {while ("ls README*" | getline > 0) print}
                                     ^

# Gawk (and others) succeed:
$ gawk 'BEGIN {while ("ls README*" | getline > 0) print}'
README.md

# GoAWK needs extra parentheses
$ goawk 'BEGIN {while (("ls README*" | getline) > 0) print}'
README.md

Thanks @raygard for the bug report.

I started looking into how to solve this problem. I ran tests with other implementations of awk, and it seems that all binary operators can be placed after "getline" in this context. In fact, all the following programs are parsed without errors by gawk:

gawk 'BEGIN {while ("ls README*" | getline > 0) print}'
gawk 'BEGIN {while ("ls README*" | getline < 0) print}'
gawk 'BEGIN {while ("ls README*" | getline + 1) print}'
gawk 'BEGIN {while ("ls README*" | getline * 100) print}'

The only approach that comes to my mind is to modify the getLine() function in a way similar to this:

func (p *parser) getLine() ast.Expr {
    expr := p._assign(p.cond)
    if p.tok == PIPE {
        p.next()
        p.expect(GETLINE)
        target := p.optionalLValue()
        res := &ast.GetlineExpr{expr, target, nil}
        if p.tok == GREATER {
            p.next()
            res = &ast.BinaryExpr{res, GREATER, p.expr()}
        }
        return res
    }
    return expr
}

However, instead of doing this only for GREATER, it would need to be done for all types of operators that you want to allow after "getline" (perhaps by defining a map that contains all of them or by using the matches function and listing all of them in line). I think there might be a better approach, but I can't think of anything better right now.

@fioriandrea, thanks for this. I'd be happy to accept a PR for this, if you're game. Let's start keeping it simple with matches() and if that proves unwieldy we can consider a map or something.