Knowing which files changed

Question

Knowing which files changed

david-k-johnson opened this issue 2 years ago · comments

pre-commit has the ability to know which files have changed, and the user when calling pre-commit can pass all files(pre-commit run --all-files), all files that changed in a git range(pre-commit run --from-ref origin/HEAD --to-ref HEAD), or by default the files changed since last push, last commit, etc based on hook running. Is this a planned feature or something that can be done today(I have seen no indication that it can be done, even in the hook itself since no environment vars or args are passed through to deduce any of this). Also having this feature is important if one wants to tun this in CI/CD where a series of commits may come in at once.

Maxence Lecanu · Answer 1 · Thu Mar 17 2022 21:40:13 GMT+0800 (China Standard Time)

Hello ! Thanks for looking into my project and sorry for not answering sooner :)

If I understand this correctly what you are asking for here is:

Have a feature for running mookme as if every files of the repo were changed ? Ie, you would like to run every available hook for a given git hook type (pre-commit, prepare-commit-msg, etc...) right ?
If so, this already exists with the --all flag of npx mookme run -t={hook type} --run-all

here is the related documentation

Choose which a span of commits from which to take the list of changed files
Definitely possible, depending on having a git command providing such a list in a relatively parsable format, and it could be a nice feature request.
Indeed, by default, we look the staged files compared to the last commit (HEAD)

Also having this feature is important if one wants to tun this in CI/CD where a series of commits may come in at once.

Could you provide me with a bit more details about this please ? I am not sure I understand this part correctly. Mookme has not been designed to run within a CI, but rather in a local environment before you commit.

As it is, I can not ensure that it can properly work on the git server side, because I simply not got the opportunity to test this case. However, it does support the git hooks related to the server side.

If you give me a few use cases of using Mookme either on the Git server side or in the CI/CD, it will give me a better idea of what is expected here and we can start working on it from this point :)

Leonidas Loucas · Answer 2 · Fri Mar 18 2022 13:48:29 GMT+0800 (China Standard Time)

I can chime in on part of what @david-k-johnson is asking about (using a similar tool https://pre-commit.com/).
In our CI we run a verification pass of pre-commit as if a user was running it but slightly altered semantics. for example

example commands

Run a particular config from the last known green revision to the revision currently under test
pre-commit run --config .pre-commit-config.python.yaml --from-ref "${LAST_GREEN_REV}" --to-ref "${CURRENT_CI_REV}"
or

Run from a grandfathered in sha (a23128dsjasd8nasjdsh in this case), until the rev currently under test.
pre-commit run --config .pre-commit-config.python.yaml --from-ref a23128dsjasd8nasjdsh --to-ref "${TOOLS_GIT_REV}"

David Johnson · Answer 3 · Fri Mar 18 2022 13:54:57 GMT+0800 (China Standard Time)

Yes as @merc1031 is correct. The one thing to add, is that pre-commits interface to its hooks is to pass the files that have changed(which can be manipulated by the user with --all-files, --from-ref/--to-ref, or the default which is the currently staged uncommitted files. I know this would be a big change but otherwise every user needs to implement this themselves.

Maxence Lecanu · Answer 4 · Fri Mar 18 2022 16:17:05 GMT+0800 (China Standard Time)

Regarding --all-file

Once again, unless I'm missing something in what you are asking for, this already exists, with the --run-all option of mookme. It will your hooks as if every files in the monorepo/repo were changed.

The one thing to add, is that pre-commits interface to its hooks is to pass the files that have changed

I am truly sorry but I don't understand this part. I understand this in at least two ways.

Are we speaking of an option in the hooks configuration to select against what type of file this hook should run ? If so it exists. You can set the onlyOn option in the hook with a pattern to match.
Or is it about a CLI option to provide what files to use when selecting which hooks to run ? If so, I can come up with two additional options in the mookme run command:

--from (defaulting to HEAD)/--to (defaulting to null, will look into the currently staged files)
--files (optional) -> If this is set, it will ignore every from/to and resolution and just run the hooks concerned by these files.

Would it fit your usage ?

David Johnson · Answer 5 · Sat Mar 19 2022 06:47:38 GMT+0800 (China Standard Time)

The command line options that are mentioned are not useful on their own. What needs to change is how each hook is called. Currently all hooks are passed what git passes by default:

WARNING

{args} are replaced with the hook arguments when the command is executed. See the Git documentation on hooks

And you have an example to compte changed files by each hook this way:

Here is how the python-changed-files script looks like

#!/usr/bin/env bash
git --no-pager diff --cached --name-only --diff-filter=AM --relative -- "***.py" | tr '\n' '

What I am proposing(and what pre-commit already does) is instead of the hooks getting passed what git passes to hooks, you pass all the files that have changed and matches the onlyOn pattern to the hooks.

With this change and then adding the fromRef/toRef and passing the files that are change in that range, or all files sends all files, or whatever else needs to be supported can be more easily implemented and hooks will not have to change at all.

Maxence Lecanu · Answer 6 · Sat Mar 19 2022 07:18:07 GMT+0800 (China Standard Time)

Okay, I think it is clearer now in my head thanks for clarifying.

So, regarding changing what's in the args argument, that can be used when defining the command of a hook, this is simply not possible without coming up with an alternative, because it would result in a massive breaking change for other users.

For instance, using commitlint within Mookme relies on the args variable.

However, I agree that this single option is not enough for other use cases and we could do better by offering more options. I think it is important to match common enough use cases, because partials especially exist to allow for any weird stuff the end user might want to do. What we could offer is:

{changed-files} -> The list (comma-separated, relative paths from the git folder) of files changed, regardless of them being matched
{matched-files} -> The list (comma-separated, relative paths from the git folder) of files changed, and matching the pattern. If no pattern is defined for the hook, it equals changed-files.
see anything else ?

These options would be used the same way {args} is used today:

{
    "steps": [{
        "name": "commit lint",
        "command": "cat {args} | ./node_modules/@commitlint/cli/cli.js"
    }]
}

{
    "steps": [{
        "name": "Show me the changed files",
        "command": "echo {changed-files}"
    }, {
        "name": "Show me the python changed files",
        "command": "echo {matched-files}",
        "onlyOn": "**/*.py"
    }]
}

The --from-ref --to-ref are still on the table, but I would like to track them in a separate issue.

Would you agree on this ?

David Johnson · Answer 7 · Sat Mar 19 2022 10:51:21 GMT+0800 (China Standard Time)

Yes that is fair. I see the fromRef/toRef as only useful after this feature is implemented:) I think your suggested approach could work nicely as well. I am guessing the matched/changed files would only be for that sub projects files as well?

Maxence Lecanu · Answer 8 · Sat Mar 19 2022 17:24:40 GMT+0800 (China Standard Time)

Great ! This is a piece of work but I'll work on this next week to come up with an extensible basis for the arguments mookme offers for writing hooks

David Johnson · Answer 9 · Mon Mar 21 2022 04:52:43 GMT+0800 (China Standard Time)

One concern that will have to be addressed, is that environment size is fairly limited, so passing all the files at once in an env_var will not be possible. pre-commit goes through some effort to partition the files into chunks < max environment size, with the side effect of calling hooks multiple times. I am not sure how you will go about solving this issue. Doing a google search of max environment variable size will show maximums based on OS.

Maxence Lecanu · Answer 10 · Mon Mar 21 2022 05:30:26 GMT+0800 (China Standard Time)

One concern that will have to be addressed, is that environment size is fairly limited, so passing all the files at once in an env_var will not be possible

I don't think that it's an issue. These variables are directly interpolated in the command before it is ran, so we don't use environment variables. After a quick search, it seems that OS/Shell commands are limited to a few thousands character, definitely enough. Below is this limit for my mac.

If ever this remains an issue, a solution would be to write the list of affected files in a cache file, in the global .hooks folder at the root of the repository, and to provide the path to this file (relative from the root directory, where .git is located) in the hooks variable.

Yet, the more I dig into this, the more it feels to me that I'm about to write an alternative to the partials feature, that for some reason I don't understand, you are reluctant to use.

It would help me a lot to understand why ? If writing the partials themselves is your issue, mookme could be provided with a default set of utils script, copied during the init phase, so that it becomes possible to use them by default in the hooks.

Maxence Lecanu · Answer 11 · Mon Mar 21 2022 05:32:06 GMT+0800 (China Standard Time)

Regardless of my last comment, when going into the code for this, providing changed-files and matched-files as out-of-the-box partials would definitely be my way to go, as this feature exists for this exact purpose.

David Johnson · Answer 12 · Mon Mar 21 2022 07:36:22 GMT+0800 (China Standard Time)

So currently partials are not passed the information to know which files have changed. For example if I specify --run-all flag how does the partial know that the user wants to run all vs run against changed vs the requested other feature of running against a range of refs? Also currently I would have to make a special partial for each variation of matched files that a hook wants, say one matches against .js files and another a .py file, I cannot pass that matched json key/value to the partial currently.

For example how do I write the below partial to handle all files vs currently changed files vs from-ref/to-ref while not duplicating onlyOn? I want to run this same hook against any of the above possibilities(current changeset, HEAD~10..HEAD, all files based on user input to the mookme run ... command) with no changes in the hook?

A partial:

#!/usr/bin/env bash
git --no-pager diff --cached --name-only --diff-filter=AM --relative -- "***.py" | tr '\n' '\0' | xargs -0 "$@"

A hook

"steps": [
    {
      "name": "Run pylint but only on changed files",
      "command": "python-changed-files pylint",
      "onlyOn": "***.py"
    },
]

Maxence Lecanu · Answer 13 · Mon Mar 21 2022 16:37:56 GMT+0800 (China Standard Time)

Okay I get it, it is clearer to me that partials are not a good way to achieve this indeed, thanks.

It makes me think a bit of #63 as well, I might to handle this one before adding your requested feature. So far, onlyOn is evaluated just before the step execution, so that we can set the skipped flag in the UI. However there is no reason to do this outside of the hook resolver section of the code, and to just pass the skipped boolean alongside the step.

By doing so, it becomes feasible to make step execution (or not) 100% files-based (provided that the step is VCS sensitive, as it is mentionned in #63, we are talking about what would be the FilesChanged hook trigger strategy).

We are talking about a pretty big refactor so I just ask you to bear with me a few days/weeks while I'm trying to do things the proper way here :) If you are in a hurry, any contributions are welcome. I will have a window to move on these two issues on Friday probably.

Maxence Lecanu · Answer 14 · Fri Mar 25 2022 21:32:56 GMT+0800 (China Standard Time)

I came up with an implementation of both arguments today, let me know if it is okay for you :)

One obvious missing feature from where I stand is the ability to pick a separator fir the list of files (eg, say if I want the list being outputed using a ",", a blank space or anything else as a separator

David Johnson · Answer 15 · Sat Mar 26 2022 04:03:55 GMT+0800 (China Standard Time)

I will try out the branch this weekend:). I don't think the separator is a big deal as one could just write some shell script to transform it.

David Johnson · Answer 16 · Sat Mar 26 2022 12:08:03 GMT+0800 (China Standard Time)

What is a way I can install this branch for my project. I am not sure how to do it with the monorepo setup? Do you have documentation on how to produce the 'mookme' binary? I believe I have tried the obvious things like npm run build etc...

Maxence Lecanu · Answer 17 · Sun Mar 27 2022 02:17:34 GMT+0800 (China Standard Time)

I am horribly late for writing development guidelines and techniques, I am doing my best but am truly sorry for the inconvenience. Definitely on my next up list.

From my own experience, I tried using the npm link feature, but what is really easy to use for me is to invoke the built js file directly as if it was the Mookme CLI.

In packages/mookme -> npm run dev for having the CLI re-transpiled on every TS file change
-> This leads to the dist folder in packages/mookme to alway feature the latest JS index file
I invoke this JS index file directly in another test monorepo where I mocked dummy changes on some specific folder, to see what happens.

The debugger has really changed the way I develop on Mookme so I'll provide it for you in the command below, but you can naturally remove the DEBUG= environment variable to Mookme in action.

If you wanna test more in-depth the debugging, here is the documentation of the library we use

So, in the test repository
DEBUG=mookme:*,-mookme:writer,-mookme:ui node ../mookme/packages/mookme/dist/index.js run -t pre-commit

You can replace run -t pre-commit by any Mookme command

Maxence Lecanu · Answer 18 · Thu Mar 31 2022 00:45:07 GMT+0800 (China Standard Time)

@david-k-johnson the two variables are available as of 2.1.1 :) I'll go on with the rest of the feature request