benhoyt / goawk

A POSIX-compliant AWK interpreter written in Go, with CSV support

Home Page:https://benhoyt.com/writings/goawk/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Incompatibility of regular expression \b

ko1nksm opened this issue · comments

In other awk implementations, \b is a backspace, but goawk (golang) seems to treat it as a word boundary.

$ printf "AB\bC" | goawk '{gsub(/\b/, "@"); printf $0}' | hexdump -C
00000000  40 41 42 40 08 40 43 40                           |@AB@.@C@|

$ printf "AB\bC" | gawk '{gsub(/\b/, "@"); printf $0}' | hexdump -C
00000000  41 42 40 43                                       |AB@C|

$ printf "AB\bC" | mawk '{gsub(/\b/, "@"); printf $0}' | hexdump -C
00000000  41 42 40 43                                       |AB@C|

The current workaround is to specify in octal notation. (Therefore, not a very important issue for me.)

$ printf "AB\bC" | goawk '{gsub(/\10/, "@"); printf $0}' | hexdump -C
00000000  41 42 40 43                                       |AB@C|

Thanks for the report. Yes, Go's regexp package deviates from AWK's handling here, and treats \b is a word boundary rather than backspace. GoAWK borrows Go's regexp package rather than implementing its own (which I'm not going to do anytime soon :-), so we're stuck with what it provides.

I think the "word boundary" behaviour is much more useful in any case -- I fairly regularly use \b as a word boundary, but I've never needed to regex match on backspace.