[Feature Request] Support case when end range pattern is not distinct from start pattern
balki opened this issue · comments
Example log file
ts:Jan 11 12:16:33 INFO Blah Blah Blah
ts:Jan 11 12:16:33 INFO Blah Blah Blah
ts:Jan 11 12:16:33 ERROR Got Exception in module foo
1. Traceback (most recent call last):
1. File "/tmp/teste.py", line 9, in <module>
1. run_my_stuff()
1. NameError: name 'run_my_stufff' is not defined
ts:Jan 11 12:16:33 INFO Blah Blah Blah
ts:Jan 11 12:16:33 INFO Blah Blah Blah
ts:Jan 11 12:16:33 INFO Blah Blah Blah
ts:Jan 11 12:16:33 INFO Blah Blah Blah
ts:Jan 11 12:17:33 ERROR Got Exception in module foo
2. Traceback (most recent call last):
2. File "/tmp/teste.py", line 9, in <module>
2. run_my_stuff()
2. NameError: name 'run_my_stufff' is not defined
ts:Jan 11 12:16:33 INFO Blah Blah Blah
ts:Jan 11 12:16:33 INFO Blah Blah Blah
I am trying to Extract the error line Got Exception in module foo
along with the following traceback.
First attempt:
❯ goawk '/Got Excep/,/^ts:/' data.txt
ts:Jan 11 12:16:33 ERROR Got Exception in module foo
ts:Jan 11 12:17:33 ERROR Got Exception in module foo
This does not work because the end range expression /^ts:/
, also matches the error line, so the range begins and ends with the single line. There is no easy way to match the last line of the exception or the next log line. Finally found a working solution but it is no longer an one-liner and is not straightforward to understand.
Solution:
❯ goawk '
1 { endcond = 0 }
/Got Excep/ , endcond {
if (/^ts:/ && !/Got Excep/)
endcond = 1
else
print $0
}
' data.txt
ts:Jan 11 12:16:33 ERROR Got Exception in module foo
1. Traceback (most recent call last):
1. File "/tmp/teste.py", line 9, in <module>
1. run_my_stuff()
1. NameError: name 'run_my_stufff' is not defined
ts:Jan 11 12:17:33 ERROR Got Exception in module foo
2. Traceback (most recent call last):
2. File "/tmp/teste.py", line 9, in <module>
2. run_my_stuff()
2. NameError: name 'run_my_stufff' is not defined
Can we have a command line flag or special syntax such that end pattern is not checked if it is the first line in the range?
e.g.
❯ goawk --no-end-check '/Got Excep/,/^ts:/' data.txt
or use double comma (,,) to enable this behavior. This is currently a syntax error, so should be backwards compatible.
❯ goawk '/Got Excep/,,/^ts:/' data.txt
<cmdline>:1:13: expected expression instead of ,
/Got Excep/,,/^ts:/
Yes, this is slightly tricky, isn't it? I'd rather not introduce new syntax and range pattern types above and beyond POSIX here, so I'd suggest not using a range pattern for this, but two patterns with a flag. Similar to your endcond
solution but a bit simpler (and one line :-).
You have to be careful with the order of the patterns, putting the /^ts:/ { e=0 }
pattern-action first, so that the /Got Excep/ { e=1 }
sets e
to 1 for that first line before the e { print }
pattern is evaluated, and the "Got Exception" line is printed:
$ goawk '/^ts:/ { e=0 } /Got Excep/ { e=1 } e { print }' data.txt
ts:Jan 11 12:16:33 ERROR Got Exception in module foo
1. Traceback (most recent call last):
1. File "/tmp/teste.py", line 9, in <module>
1. run_my_stuff()
1. NameError: name 'run_my_stufff' is not defined
ts:Jan 11 12:17:33 ERROR Got Exception in module foo
2. Traceback (most recent call last):
2. File "/tmp/teste.py", line 9, in <module>
2. run_my_stuff()
2. NameError: name 'run_my_stufff' is not defined
You can even shorten it slightly more by dropping the { print }
on the last pattern, as that's the default:
$ goawk '/^ts:/ { e=0 } /Got Excep/ { e=1 } e' data.txt
The Gawk manual also has a couple of examples for range patterns that might be useful (though they don't quite fit what you're doing here).
Hope that helps!
Thanks! Though not obvious at first glance, yet concise and clear.
$ goawk '/^ts:/ { e=0 } /Got Excep/ { e=1 } e' data.txt