Support `failure` hooks and scripts

Question

Support `failure` hooks and scripts

gabyx opened this issue 3 months ago · comments

Describe the bug

A hook [hooks.apply.post] should also run on failure.

To reproduce

Define a hook like [hooks.apply.post] and make chezmoi apply fail for example with a failing run_before_bla.sh script.
The hook post according to documentation should be run but does not.

Also it would be good to have run_failure_bla.sh such that on any apply failure these scripts are executed.

Expected behavior

The hooks should run.

Austin Ziegler · Answer 1 · Sun May 19 2024 09:40:27 GMT+0800 (China Standard Time)

I don't think that this is a bug, but an enhancement. The documentation is unclear on the matter, but the way that Chezmoi generally works is that subsequent steps only when the previous step succeeds, and there’s a very small amount of clean-up (on the state file side) that happens post-failure.

I’m not sure what sort of hook would run post-failure, but I think that interpreting hooks.*.post as if it means hooks.*.on-success is a reasonable interpretation as hooks run at failure would probably need to run differently. I think that hooks.*.failure and run_failure_* scripts are a possible feature, because those would nominally be built for failure cases.

What would you be using this feature for?

Gabriel Nützi · Answer 2 · Sun May 19 2024 22:38:20 GMT+0800 (China Standard Time)

I would use the feature for
the following use case maybe I am also doing things overcomplicated:

I use

age:
  identity: ~/.config/chezmoi/key
  recipient: age...

I have a run_before_decrypt-private-key which decrypts my private key ~/.config/chezmoi/key.age (its a passphrase protected private key) to ~/.config/chezmoi/key. (The passphrase I get from consulting my login keyring on my NixOS which is gnome-keyring over secret-tool).
I have a run_after_delete-decrypted-private-key which deletes ~/.config/chezmoi/key again.

In this way I can run just apply when logged in and chezmoi will happily apply everything and also be able to decrypt all files.
When chezmoi fails, it would be nice to make sure the ~/.config/chezmoi/key is surely deleted, because I dont want this key on the disk. I could use a run_onfailure... or something along this lines.

Also the following second solution could be nice (but currently does not work non-interactively) :
Instead of using key I could directly use key.age (the passphrase protected private-key)

age:
  identity: ~/.config/chezmoi/key.age
  recipient: age...

which already works, and chezmoi apply will prompt for a passphrase. However making this non-interactive is not possible because age has no option to provide the passphrase in an env. variable for example, so chezmoi cannot do something in particular. It's to say that for my purposes I am also using a patched version of age where it reads AGE_PASSPHRASE from the environment (currently not in main and there are lots of issues suggesting this already FiloSottile/age#275)

So in that way this solution does not work, and my approach is perfectly fine. But having a maybe a run_onfailure would give me the chance to properly delete the key file. Is that somehow understandable how I described it.

What do you think?

Gabriel Nützi · Answer 3 · Sun May 19 2024 22:50:17 GMT+0800 (China Standard Time)

Thinking again:
I think what is missing in chezmoi is the

aga:
  identity: 
      cmd: "get-private-key-from-keyring"
      args: ...

Because age already works like this: echo AGE-SECRET-KEY-(...) | age -i - -d file.age
This would be safer, as one does not need to store the private key on the disk and can get it by any means.

Austin Ziegler · Answer 4 · Mon May 20 2024 03:18:13 GMT+0800 (China Standard Time)

I think that you’ve described a useful clean-up case for such a feature to be considered.

As you suggest, there is a workaround for this specific issue, so the urgency is perhaps reduced.

Tom Payne · Answer 5 · Mon May 20 2024 16:12:57 GMT+0800 (China Standard Time)

A difficultly here is that chezmoi can terminate in many different ways and cannot guarantee to be able to run a cleanup script. For example, the user might hit Ctrl-C while chezmoi is running which will (by default) terminate chezmoi. As well as this, the commands that chezmoi runs might fail (leading to a graceful error exit), chezmoi might panic (leading to a less graceful exit), or chezmoi might receive a signal (some of which could be caught, but not all).

Overall, chezmoi simply cannot implement "run after failure" hooks reliably, and implementing them unreliably is a lot of work.

The reliable way to do this is to use a wrapper script around chezmoi that ensures that the cleanup is done, no matter how chezmoi exits, something like:

#!/bin/bash

trap "rm -f file.age" EXIT
echo AGE-SECRET-KEY-(...) | age -i - -d file.age
chezmoi "$*"

Gabriel Nützi · Answer 6 · Mon May 20 2024 16:34:46 GMT+0800 (China Standard Time)

@twpayne : I totally agree, cleaning up and error handling is hard.

Wrapping chezmoi of course always works. But I say that chezmoi could have a [hooks.apply.post] which is always execute (of course not on panic because thats an internal chezmoi error).
always means: signal handling with CTRL+C and before exiting can be handled in Go, I ve done that in Githooks, it works so far:
https://github.com/gabyx/Githooks/blob/main/githooks/apps/cli/cli.go#L14

Basically you have global cleanup handler functions which need to be run on exit and also on signals.
Not sure that helps, so in that sense wouldn't it be possible to have the hooks always run, even on exit !=0 and signals.
Of course the hook can only be run when the toml has already been parsed, there are points in time where the Ctrl+C will not run the hook because a cleanup function has not been installed yet. You cant fix that, its just how it is...

=) Thanks for considering this. Maybe there is somethinge here. not sure.

Tom Payne · Answer 7 · Mon May 20 2024 17:04:52 GMT+0800 (China Standard Time)

Basically you have global cleanup handler functions which need to be run on exit and also on signals.

However, this global cleanup handler needs to know which hook to run, which is only available further down the stack.

I'll accept a high quality PR that implements the desired functionality. In the meantime, I'll close this as "not planned".