theryangeary / choose

A human-friendly and fast alternative to cut and (sometimes) awk

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[feature req] choose print text between or after matches

daniejstriata opened this issue · comments

In grep I can print what it finds between values using easy to remember syntax for perl-regexp. Could choose be extended to also find text between matches that is easier to use than perl-regexp?

choose -m string1 string2 to print the value found after matching string1 and stopping at string2.

choose -M string1 string2 to print the value including the matched string1 and string2. adding 0 1 2 ... could then print specific characters inside the matched result.

choose -a string1 will look for string1 and print all the text on that line after matching string1.

choose -A string1 will look for string1 and print all the text on that line including string1 that matched. adding 0 1 2 ... could then print specific characters inside the matched result.

Here are some grep examples:

Grep from match to end

Example 1

Text to match

$ gdu --version
Version:         v5.20.0
Built time:      Sat Oct 22 10:48:31 PM CEST 2022
Built user:      dundee

gdu --version | grep -oP '(?<=Version:\t\s).*'
Output:
v5.20.0

Example 2

Text to match

$ openssl x509 -noout -enddate -in /etc/ssl/certs/COMODO_Certification_Authority.pem
notAfter=Dec 31 23:59:59 2029 GMT

Keep only value

openssl x509 -noout -enddate -in /etc/ssl/certs/COMODO_Certification_Authority.pem | grep -oP '(?<=notAfter=).*'
Dec 31 23:59:59 2029 GMT

Grep between matches

Text to match

docker inspect 9512b532dcaf1 | grep tls
                "/etc/dockers/conf/web/tls:/etc/ssl/nginx:ro",
                "Source": "/etc/dockers/conf/web/tls",
docker inspect 9512b532dcaf1 | grep -oP '(?<="Source": ").*(tls)'
/etc/dockers/conf/web/tls

vs

docker inspect 9512b532dcaf1 | grep -oP '(?<="Source": ").*(?=tls)'
/etc/dockers/conf/web

hey @daniejstriata, this is an interesting idea. I think that you can already do this. Take a look at these examples (based on your examples):

$ echo 'Version:         v5.20.0' | choose -f 'Version:\s+' 0
v5.20.0

You could also replace the 0 with : here. The same solution applies to the openssl example.

> echo '                "Source": "/etc/dockers/conf/web/tls",' | choose -f '("Source": "|tls)' 1
/etc/dockers/conf/web/

Using regex field separators with an or condition (|) lets you effectively set a beginning and end. Assuming the text only appears once in a line, the content between the start and end will always be index 1.

So returning to the string1/string2 examples at the top of your comment, the line lorem ipsum string1 dolor sit string2 amet can be split with choose -f '(string1|string2)', to select the text between the separators, or choose -f 'string1' to select the text after the separator. I believe this addresses the lower case -m, -a proposal.

Admittedly, this does not solve your suggested case of including the start and end strings, but I see no reason not to use grep "string1.*string2" or grep "string1.*" for that case. Then you can use choose -c` to select characters within that. This should address the upper case -M, -A proposal.

Based on these alternatives, I don't think any further change is needed to support your usecase. However, I'd be interested to hear if you feel differently and have more examples.