chmln / sd

Intuitive find & replace CLI (sed alternative)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Operations on capture groups

alecandido opened this issue · comments

I'd like to have some flexibility about how to use the captured text.
My specific use case is about case: I'd like to lower case the matched text.

Possible with sed:
https://stackoverflow.com/a/1814396/8653979

Looking through the issues specifically for something like this.

But my case is a bit different. When I think "operation" I think any shell command with input that generates an output. Whereas what sed can do looks like it only supports specifically set operations.

In my project I'm probably going to implement this with ripgrep and current sd functionality though.

In my project I'm probably going to implement this with ripgrep and current sd functionality though.

The how would also be interesting, but I expect that it will involve matching, piping, transforming, and finally replacing. While this is always good to know, the original intention was to request support for basic operations, in order to make them possible with a single simple sd invocation.

P.S.: when the operation becomes that complex

.. matching, piping, transforming, and finally replacing

I prefer to write a full-fledged program, that can be a Rust one if efficiency is required (using regex is the programmatic alternative to sd), or even with a simple Python script (at least I have full and structured access to the match object and location, without parsing strings)

I prefer to write a full-fledged program, that can be a Rust one if efficiency is required (using regex is the programmatic alternative to sd), or even with a simple Python script (at least I have full and structured access to the match object and location, without parsing strings)

I think that's fair but I do like operating in the shell to reduce dependencies :)

Personally, I quite like how much you can do with jq. The syntax is terse, if a bit hard to remember - not unlike sed or other "original" tools really. But you can do quite a lot quickly if you can google or get use to it.

I reckon some commands in the replace expression might work well with sd's approach. As we have the $1 syntax for replacing capture groups, I'll use the syntax ${command} for a second to represent these commands:

sd '^.*,[:space:]*,.*$' '${delete}' - delete any lines from a CSV file with empty elements (represented by two commas)
sd 'SIG[:alpha:]+ received' '${keep}' - delete any lines that don't contain a nasty error
sd "(hello|hi|g'day|buenos dias)" '${1.upcase}!' - make a greeting a bit more shouty

Sure there are more examples, but I am too busy as I type to think of some.

Where sed uses a different command represented by a letter (e.g. d/.../), I am wondering if we represent some simple operations (using the dollar to escape, as this should help compatibility since it's already a special character), in the second argument, whether this adds more possibilities to use sd, without cluttering it too far. The line deletion is a real thing I have come across; I think without being able to remove lines I have just had to use something else in the shell to cut lines that are completely empty, after using sd to do the original replacement. I like the cleanliness, regexes are bad enough as it is without the extra stuff on top, but also would like to be able to use it to do more things.

@stellarpower thanks for the syntax proposal, I like the idea of the command inside some delimiters. I would always keep the group identifier, and use a different separator, like ${0:upper}, ${1:lower}, or ${dollars:rev} (to mirror the order of character, I would suggested mirror for the sake of clarity, but rev is already a shell command performing this job).

However, to delete and keep lines you don't need a specific command: you just have to use multi-line mode [with the m flag] (you can pass with sd -f m), match the object you want to remove, and replace it with ''.

E.g.:

# delete
sd --flags m '^.*,[:space:]*,.*$\n' '' your-file.txt
# keep
# not really working with sd, because the `regex` crate does not support look around
sd --flags m '^.*(?!SIG[[:alpha:]]+ received.*$\n' '' your-file.txt

For more information about the look around you can have a look at this SO answer and the suggested replacement http://www.formauri.es/personal/pgimeno/misc/non-match-regex/?word=foo

However, a practical answer is that, if you want to operate line-wise, just use ripgrep:

# delete
rg -v ',[:space:]*,' your-file.txt > new-file.txt
# keep
rg 'SIG[[:alpha:]]+ received' your-file.txt > new-file.txt

It is based on the same regex crate of sd, and it has been created exactly for this task (not file editing, but line-wise selection).