FR: Generalized hook support

Question

FR: Generalized hook support

matts1 opened this issue 2 months ago · comments

Is your feature request related to a problem? Please describe.
I was looking at #2845 and realized that it didn't meet my needs because it didn't support pre-upload hooks. I investigated further into hooks, and the closest I found was #405, which rather than generalized hooks, specifically is trying to solve pre-commit hooks.

Describe the solution you'd like
I was thinking of treating hooks like a pub-sub system. jj would publish events, which would consist of some metadata. Your hooks would simply be subscriptions to some subset of the events, matching specific metadata. For example:

message CommitDescription {
   string commit = 1;
   string description = 2;
}

message HookEvent {
    string workspace = 1;
    oneof hook_type {
        CommitDescriptionHook commit_description = 2;
        PreUploadHook pre_upload = 3;
    }
    ...
}

We would then, similar to my #3575, run all hook binaries that have subscribed to that event in the global config.toml. Similar to #3575, these hook binaries would be passed a file descriptor with a connection to the jj grpc server. For example, my global config.toml might look like:

[hooks]
subscribe_to_every_event = { binary = "/path/to/foo" }
pre_upload = { binary = "/path/to/bar", type = "pre_upload" }
specific_workspace = { binary = "/path/to/baz", workspace = "/path/to/workspace" }

We can also implement per-repo hooks in the same way that git has them by allowing additional subscribers from a hooks/jj.toml or similar, though I'd hold off on this in the short term due to security concerns.:

For example, a pre-upload hook that forces your commit to have an associated bug might look like:

conn = grpc.connect(file_descriptor(os.environ["JJ_SERVER"]))
main = importlib.import("/path/to/foo_ext.py", "main")
event = conn.get_event()

if "BUG=" not in event.commit_description.description:
    bug = input()
    description = f"{event.commit_description.description}\nBUG={bug}"
    conn.set_description(commit_id=event.commit_description.commit_id, description=description)

Though similar to my suggestion in #3575, we could provide preludes for common languages, and simply make the hook:

def main(conn, event):
    if "BUG=" not in event.commit_description.description:
        bug = input()
        description = f"{event.commit_description.description}\nBUG={bug}"
        conn.set_description(commit_id=event.commit_description.commit_id, description=description)

We would thus provide a method in jj along the lines of:

fn publish_event(event: &HookEvent) -> anyhow::Result {
    Vec<&Hook> hooks = filter_hooks(event);
    // Just a regular grpc server, but get_event() will always return the hook object.
    GrpcServer server = hook_grpc_server(event);
    for (hook in hooks) {
        let return_code = hook.run(grpc_server);
        if (return_code != 0) {
          bail!("Hook failed");
        }
    }
    Ok(())
}

Describe alternatives you've considered
Didn't really consider much else, this is the first idea that came to mind. I thought it was a nice starting point. Other ideas are welcome.

Additional context
Add any other context or screenshots about the feature request here.

Philip Metzger · Answer 1 · Sat Apr 27 2024 00:04:41 GMT+0800 (China Standard Time)

cc @arxanas for his opinion on hooks, as we discussed them quite heavily for jj run (#405/#1869).

Im very much on board on providing client integrations and having a pub-sub like event system in the daemon but I don't want to have it in the cli. Having external languages and binaries would fix some re-occurring ideas in relation to the template language #3262 and this recent Discord discussion.

Matt · Answer 2 · Mon Apr 29 2024 07:32:08 GMT+0800 (China Standard Time)

Could you elaborate on what you mean by "don't want to have it in the CLI"?

My line of thought, as I said above, was that:

The hooks would be external binaries
The config file would specify the hooks
The pub/sub aspect of them would be done within jj-lib

This would mean that when you run a CLI command such as jj describe, it would run a pre-describe hook. I can't see any way of doing this cleanly other than that when you run the jj command you get hooks, so I'm rather confused by your statement of "don't want to have it in the CLI", both in terms of what you mean and why you don't want it.

What was your vision of how hooks would work?

Philip Metzger · Answer 3 · Tue Apr 30 2024 00:12:01 GMT+0800 (China Standard Time)

My line of thought, as I said above, was that:

The hooks would be external binaries

The config file would specify the hooks

The pub/sub aspect of them would be done within jj-lib

That's all fine, if you can take the performance penalty and latency from the external binary.

Could you elaborate on what you mean by "don't want to have it in the CLI"?

It has no place there, as it severely overlaps with run and forcing a pre-upload or presubmit hook in there makes the user experience rather unpleasant. If we can make the CLI emit just a proto message through IPC/vsock and the daemon handles the whole hook invocation, the user experience is kept smooth.

What was your vision of how hooks would work?

I'd rather just offload the whole hook system to the daemon and it is responsible for everything, the CLI only emits events which you can react upon and nothing more.

Matt · Answer 4 · Tue Apr 30 2024 06:41:46 GMT+0800 (China Standard Time)

That's all fine, if you can take the performance penalty and latency from the external binary.

In order to be sufficiently generic, any hook is necessarily going to have to be an external binary. I'm not sure I get what you're saying. Otherwise, rather than a generic hook, I'd have a fixed set of hooks that come packaged with jj that I could use.

I'd rather just offload the whole hook system to the daemon and it is responsible for everything, the CLI only emits events which you can react upon and nothing more.

Let's use a more concrete example. Suppose I wanted to make a post-describe hook that validated that the commit had a bug associated with it in the commit description. What would the workflow look like? My imagination was:

jj describe
Type a commit message without a bug
Start up an API server
Hook runs, connecting to that API server
Commit message validated via the hook, fails validation
jj describe fails with an exit status

I think necessarily any time a hook runs, the CLI is going to have to wait for that hook to finish so it can check whether the hook succeeded or failed. That being said, step 3 could be skipped if we just have a permanent API server running in the daemon.

It has no place there, as it severely overlaps with run and forcing a pre-upload or presubmit hook in there makes the user experience rather unpleasant. If we can make the CLI emit just a proto message through IPC/vsock and the daemon handles the whole hook invocation, the user experience is kept smooth.

I think the workflow I'd imagined is impossible with what you're proposing. What user workflow do you imagine? Could you take me through what happens both from a user perspective and a code perspective with your idea.

Martin von Zweigbergk · Answer 5 · Wed May 01 2024 01:30:43 GMT+0800 (China Standard Time)

One reason I've avoided hooks so far is that we sometimes want solutions that can validate the commits after the operation is done anyway. One example of that is when you rewrite commits on the server (e.g. via some GitHub UI).

To better understand the use cases, what does your pre-upload Gerrit hook do?

Matt · Answer 6 · Wed May 01 2024 08:00:50 GMT+0800 (China Standard Time)

One reason I've avoided hooks so far is that we sometimes want solutions that can validate the commits after the operation is done anyway

I think that both pre-* and post-* hooks are reasonable, and complementary. I don't think, however, that we can say that the existence of some way to do post-validation means that we shouldn't do pre-validation.

To better understand the use cases, what does your pre-upload Gerrit hook do?

TLDR: It first finds the binary for the pre-upload hook, then it runs that binary, passing in a list of git commit shas.

ChromeOS is a multi-repo setup. You can think of it like git submodules except we don't use submodules - we use a tool called repo instead. We have one repo called repohooks which contains all of the hooks.

When we run repo upload, it first runs ${CHROMEOS_ROOT}/src/repohooks/pre-upload.py. For example, I usually work in the repository ~/chromiumos/src/bazel, so I have a script that finds the root directory based on the existence of a .repo directory (similar to how jj does it for .jj), and runs pre-upload.py <commit1> <commit2> before doing a git push.

Pre-upload.py does various things such as:

Validate that our formatter doesn't need to do anything
Check that there's correct license headers at the top of every file.
Check that your commit description has a footer containing TEST=<something>
Check that your BUG=<something> is in the correct format

Note that it does this on every commit in the stack, not just the top commit.

Unfortunately, just running repo upload isn't an option, because it works on the current branch, and doesn't work with detached head.

Martin von Zweigbergk · Answer 7 · Wed May 01 2024 08:17:31 GMT+0800 (China Standard Time)

I think that both pre-* and post-* hooks are reasonable, and complementary. I don't think, however, that we can say that the existence of some way to do post-validation means that we shouldn't do pre-validation.

If we are going to have hooks, then I think upload is a good case where they would be useful, but it might be too late to do it after uploading (you might have already accidentally uploaded your password file then).

I think the hooks you've mentioned above only require readonly access to the data. That can mostly be done after committing the transaction instead. That's not true if we also want to support hooks that can rewrite content, descriptions, branches, etc. It might be good to compile a list of use cases for that too.

Matt · Answer 8 · Wed May 01 2024 09:19:52 GMT+0800 (China Standard Time)

Upload hooks

Pre-upload hooks

Not too much to say. Pre-upload hooks should be able to mutate the commits in the stack (eg. running formatters).
I imagine that they would get access to the jj API, and could use that to run something roughly equivalent to jj run 'immutable_heads()..commit_to_be_pushed' clang-format "$(jj files @)", or do whatever else they want.

Post-upload hooks

I can't really think of a use case for this. Maybe you could have a hook to trigger some kind of github action, for example, but it seems like that should be set up on the server side? I'd love to hear other people's opinion on whether this was useful.

Describe hooks

I think these are relatively un-contentious.

A pre-describe hook would pre-fill the commit message
- A hook failure would be a warning, and would leave the message unchanged
A post-describe message would both validate the commit message and perform potential fixups on it
- Failure of this would cause the editor to rerun

fn describe(commit: Commit) ->  {
  let mut msg = match pre_describe_hook(commit) {
    Ok(m) => m,
    Err(e) => {
      // If the initial description was invalid, we still have to be able to edit that description, so it doesn't really make sense to have these fail. So any pre-describe hook failure is considered an internal error to the hook.
      warning(e);
      msg
    }
  };
  while (true) {
    msg = run_editor(prefilled);
    match post_describe_hook(Commit{msg=msg, ..commit}) {
      Ok(edited) => return edited,
      // Rerun the editor
      Err(e) =>  print(e),
    }
  }
}

Commit hooks

I've saved this for last because these are by far the hardest.

Pre-commit hooks
- Can edit the tree (eg. running formatters)
- Can validate the commit (eg. running linters)
- Can block a commit

Pre-commit hooks are hard & confusing because every time you run any jj command, it technically does a commit. It seems to me that pre-commit hooks should run when the user has the intention to say "I'm happy with this piece of work, let's commit", but that's extremely difficult to detect, because:

jj commit and jj amend clearly show an intention to do so
- But amend is just an alias for squash, which is usually not used for the commit intention (probably?)
jj new may be used to commit, or it may be used as a "checkout" command
- We could special-case when you new on top of the current commit, but it seems hard trying to explain all the nuances of the pre-commits to a user
I personally mostly use jj split when I intend to "commit" something (since it works like hg commit --interactive), but I also use it to just run a regular split.
- Similarly to new, we could special-case this for when we're splitting @, but I suspect other people use jj split for other purposes.
I use jj describe to "commit" when it's the final commit in the stack (since I use the jj edit workflow)

I don't think post-commit hooks make a lot of sense. A post-commit hook can only really do validation, and you can just as easily run that validation in pre-commit.

Because of all the difficulties described above in detecting when a "commit" intention occurs, I'm honestly not a big fan of pre-commit hooks. Instead, I'd personally prefer pre-upload hooks. I can see some small justifications for why pre-commit is better than pre-upload, but I think that being able to manually run my pre-upload hooks would be the way to go, since it's just so hard to detect when you actually "commit" in jj.

Other Git hooks

pre-rebase
- The documentation suggests "you can use this hook to disallow rebasing any commits that have already been pushed". However, since we have immutable commits, I don't think we really need this, and I can't think of any other use case.
post-checkout
- Documentation: "This may mean moving in large binary files that you don’t want source controlled, auto-generating documentation, or something along those lines"
- I can see some appeal to this, but IMO something like this should just be run explicitly from the user. It's also, similar to commit, quite hard to detect a "checkout". Interested to hear other opinions though
post-merge
- I don't think this makes much sense in jj. Merge isn't a special operation for us, so I don't think it makes sens to special-case it
pre-auto-gc
- This just seems wierd. It juts exposes internals that shouldn't be exposed?

Joy Reynolds · Answer 9 · Wed May 01 2024 09:45:43 GMT+0800 (China Standard Time)

I don't think post-commit hooks make a lot of sense. A post-commit hook can only really do validation

In the context of a GUI, would it make sense that post-commit could be a trigger to update the display?

Philip Metzger · Answer 10 · Thu May 02 2024 01:04:32 GMT+0800 (China Standard Time)

It has no place there, as it severely overlaps with run and forcing a pre-upload or presubmit hook in there makes the user experience rather unpleasant. If we can make the CLI emit just a proto message through IPC/vsock and the daemon handles the whole hook invocation, the user experience is kept smooth.

I think the workflow I'd imagined is impossible with what you're proposing. What user workflow do you imagine? Could you take me through what happens both from a user perspective and a code perspective with your idea.

Since transactions are cheap (atleast for the Git backend, don't know if that counts for Piper), we can just rewrite the actual objects with a new transaction instead of providing hooks, which would fail in the CLI. This would match the behavior of a hook and since all of this happens in the daemon, it is something you can easily opt out.

Describe hooks

I think these are relatively un-contentious.

A pre-describe hook would pre-fill the commit message
- A hook failure would be a warning, and would leave the message unchanged
A post-describe message would both validate the commit message and perform potential fixups on it
- Failure of this would cause the editor to rerun

fn describe(commit: Commit) ->  {
  let mut msg = match pre_describe_hook(commit) {
    Ok(m) => m,
    Err(e) => {
      // If the initial description was invalid, we still have to be able to edit that description, so it doesn't really make sense to have these fail. So any pre-describe hook failure is considered an internal error to the hook.
      warning(e);
      msg
    }
  };
  while (true) {
    msg = run_editor(prefilled);
    match post_describe_hook(Commit{msg=msg, ..commit}) {
      Ok(edited) => return edited,
      // Rerun the editor
      Err(e) =>  print(e),
    }
  }
}

I still don't think that such a thing belongs into the CLI. I implore you to go through to the jj run Design Doc with all it's versions (the one in the repo and the Google Doc one) and the related Discord discussion, as it contains valuable conversations in relation to hooks.

I hope I made my idea clear, that keeping hooks out of band of the CLI is long term better.

And RE: Pre/Postsubmit hooks, we'll first need some definition of forge in library for it to be useful.

Matt · Answer 11 · Thu May 02 2024 08:50:44 GMT+0800 (China Standard Time)

Note for anyone reading this thread in the future: #1869 is the tracking bug of jj run

we can just rewrite the actual objects with a new transaction instead of providing hooks

I'm not neccesarily opposed to this.

I still don't think that such a thing belongs into the CLI

To me, the most important thing is the user experience, and implementation details are secondary. If we can have a great user experience without putting it in the CLI, I'm certainly not opposed to that, but it's really hard to evaluate whether jj run is a good alternative to hooks in the CLI (even after having read the document you mentioned), because you haven't yet proposed a specific user journey.

Suppose a repo foo has a rule that the first line of a commit message must not exceed 80 characters, and the user wants to validate that when they write a description, it adheres to that specification.

The requirements I would like to impose upon a user journey are:

A user must be able to use standard jj commands
- jj describe, jj commit, jj spllit, or any other command that sets the description for a change must run this hook-like thing
  - Adding an aliasjj describe-with-bug as a jj alias is insufficient
  - Similarly for alias describe="jj run pre-fill && jj describe" as a shell alias
- This should allow for a consistent user experience
  - It would be a pain if for one repo you needed to run jj run hooks/validate_description and for another repo you had to run jj run hooks/post-describe

Based on reading the things you asked me to read, I'm not 100% certain, but it seems that you disagree with the user journey itself, and disagree with the fundamental need for a explicit implementation of a "hook". So my questions for you are:

Can jj run meet those requirements (I suspect not, given the current design of jj)?
- If not:
  - What user journey would you propose?
    - What would the user need to do to configure it (eg. .~/.config/jj/Config.toml)?
    - What would the user need to do to ensure it runs when they run a command?
  - Why do you want it if it can't meet those requirements?
    - Do you think the requirements are bad?
      - If so, why? What doesn't make sense about them?
    - Do you think that the requirements are good, but you're willing to make that sacrifice? If so, why?

From what I can tell from piecing together bits and pieces from what you're saying about doing this out of band, with a pub/sub message, and reading between the lines a lot, what would happen under the hood would look like this:

jj describe -> write commit message -> submit it
jj-cli publishes an event saying that a commit description has been updated
(this could happen anytime from now to the very end) jj-cli returns, having succeeded
jj daemon subscribes to that event and reads hooks/jj-config.toml (or some config file), and now knows that we have a subscriber to the post-describe hook that will run binary jj run hooks/post-describe
jj daemon runs
jj daemon runs the subscriber via something roughly equivalent to jj run hooks/post-describe (or something similar)
The post-describe binary runs, and the commit message fails validation

I hope you understand why I'm confused, because in the final step there, there is no way to report the failed validation to the end-user. I'm sure your idea works, but unfortunately I don't understand what you're trying to say here, so I'd appreciate it if you could elaborate on precisely what you want. Because you've communicated vague ideas rather than a precise set of steps, it's hard for me to tell what you're trying to say.