benhoyt / goawk

A POSIX-compliant AWK interpreter written in Go, with CSV support

Home Page:https://benhoyt.com/writings/goawk/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Consider adding -Wexec option of mawk and gawk

paulapatience opened this issue · comments

mawk and gawk provide the -Wexec option (also written -W exec, and further provided by gawk as -E and --exec) that allows writing Awk scripts that look like standard command-line utilities by suitably processing ARGV. The problem with #!/usr/bin/env -S awk -f is that any hyphen-prefixed options provided on the command-line may be processed by the underlying Awk interpreter rather than the script (gawk passes along options it doesn't understand). The only way around this is to wrap the Awk call in a shell script that contains exec env awk -f /path/to/script.awk -- "$@". This is what prmindent does in its Makefile. -Wexec FILE is effectively -f FILE --.

Why -Wexec rather than just --exec? From what I can gather from the gawk manual page, the POSIX-compliant way of adding implementation-specific options is by prefixing them with -W. Whether goawk should provide its other implementation-specific options under -W (and possibly only under -W, as mawk does) is something to be considered separately. In order to maximize compatibility with mawk, this option should be provided at least as -Wexec and -W exec.

The hashbang line of prmindent is #!/usr/bin/env -S awk -Wexec by default. This change would allow goawk to be a drop-in replacement for the system awk at least in this case.

I'm open to adding this. However, I don't quite understand the usefulness. Why is it a problem that hyphen-prefixed options are processed by AWK rather than the script? Even the Gawk manual's Executable Scripts section shows a hash bang of #!/bin/awk -f.

And I'm not sure I see what's wrong with the -- approach? Can you help me understand why this is really needed?

If I do add it, I'd prefer just one way to do it. Copying Gawk's -E seems best to me (as GoAWK uses all short options right now).

Hopefully the following example is clear. Put the following contents into a file named awc-test.sh:

#!/usr/bin/env bash

read -r -d '' awc <<EOF
BEGIN {
  for (i = 1; i < ARGC; i++) {
    if (ARGV[i] == "--") {
      delete ARGV[i++]
      break
    }
    else if (ARGV[i] !~ /^-./) break
    else if (ARGV[i] == "-c") c = 1
    else if (ARGV[i] == "-w") w = 1
    else if (ARGV[i] == "-l") l = 1
    else printf "awc: unknown option: %s\n", ARGV[i] >"/dev/stderr"
    delete ARGV[i]
  }
  if (!c && !w && !l) c = w = l = 1
}
{ cs += length(); ws += NF; ls++ }
END { printf "%s%s%s\n", l?ls" ":"", w?ws" ":"", c?cs" ":"" }
EOF

printf "#!/usr/bin/env -S mawk -f\n%s\n" "$awc" >mawc-f
printf "#!/usr/bin/env -S gawk -f\n%s\n" "$awc" >gawc-f
printf "#!/usr/bin/env -S awk -Wexec\n%s\n" "$awc" >awc-e
chmod +x mawc-f gawc-f awc-e

run() { echo '$' "$@"; $@; }

run ./mawc-f awc-test.sh
run ./mawc-f -l awc-test.sh
run ./mawc-f -w awc-test.sh
run ./mawc-f -c awc-test.sh
run ./mawc-f -- awc-test.sh
run ./mawc-f -- -l awc-test.sh
run ./mawc-f -- -w awc-test.sh
run ./mawc-f -- -c awc-test.sh

run ./gawc-f awc-test.sh
run ./gawc-f -l awc-test.sh
run ./gawc-f -w awc-test.sh
run ./gawc-f -c awc-test.sh
run ./gawc-f -- awc-test.sh
run ./gawc-f -- -l awc-test.sh
run ./gawc-f -- -w awc-test.sh
run ./gawc-f -- -c awc-test.sh

run ./awc-e awc-test.sh
run ./awc-e -l awc-test.sh
run ./awc-e -w awc-test.sh
run ./awc-e -c awc-test.sh
run ./awc-e -- awc-test.sh
run ./awc-e -- -l awc-test.sh
run ./awc-e -- -w awc-test.sh
run ./awc-e -- -c awc-test.sh

and run it (mawk and gawk must be in PATH, and awk must refer to one of the two). The result is the following:

$ ./mawc-f awc-test.sh
55 239 1387 
$ ./mawc-f -l awc-test.sh
mawk: not an option: -l
$ ./mawc-f -w awc-test.sh
mawk: not an option: -w
$ ./mawc-f -c awc-test.sh
mawk: not an option: -c
$ ./mawc-f -- awc-test.sh
55 239 1387 
$ ./mawc-f -- -l awc-test.sh
55 
$ ./mawc-f -- -w awc-test.sh
239 
$ ./mawc-f -- -c awc-test.sh
1387 
$ ./gawc-f awc-test.sh
55 239 1387 
$ ./gawc-f -l awc-test.sh
gawk: fatal: cannot open shared library `awc-test.sh' for reading: No such file or directory
$ ./gawc-f -w awc-test.sh
239 
$ ./gawc-f -c awc-test.sh
55 239 1387 
$ ./gawc-f -- awc-test.sh
55 239 1387 
$ ./gawc-f -- -l awc-test.sh
55 
$ ./gawc-f -- -w awc-test.sh
239 
$ ./gawc-f -- -c awc-test.sh
1387 
$ ./awc-e awc-test.sh
55 239 1387 
$ ./awc-e -l awc-test.sh
55 
$ ./awc-e -w awc-test.sh
239 
$ ./awc-e -c awc-test.sh
1387 
$ ./awc-e -- awc-test.sh
55 239 1387 
$ ./awc-e -- -l awc-test.sh
awk: ./awc-e:15: fatal: cannot open file `-l' for reading: No such file or directory
$ ./awc-e -- -w awc-test.sh
awk: ./awc-e:15: fatal: cannot open file `-w' for reading: No such file or directory
$ ./awc-e -- -c awc-test.sh
awk: ./awc-e:15: fatal: cannot open file `-c' for reading: No such file or directory

Without -Wexec, it is impossible to mimic the command-line interface of wc, or to write a script that behaves like a conventional command-line utility. -- is required to stop the Awk interpreter from parsing options, but then supposing you wanted to stop processing options in the script itself, you'd need a second --. Using awk -f in the hashbang line leaks the fact that the script is written in Awk. If you wanted to rewrite the script in another language (e.g., Go), the command-line interface would have to change (or remain unnatural).

As for providing -E only, your reasoning makes sense. It would be trivial to update by Makefile an Awk script's hashbang line to use -E rather than -Wexec. At least it wouldn't require a wrapper shell script.

Note that providing options to the Awk interpreter is always possible even with -E by explicitly invoking awk -f. So nothing is lost by using -E in the hashbang line.

Thanks for the additional justification and examples. I understand this now and think it makes sense. I've added this in #140 and will merge and tag a release in the next few days -- let me know what you think!

Thanks @paulapatience -- just tagged and released v1.20.0 with this feature.