mathieucarbou / license-maven-plugin

Manage license headers in your source files

Home Page:https://oss.carbou.me/license-maven-plugin/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Large repos are slow with `${license.git.copyrightCreationYear}`

joakime opened this issue · comments

Describe the feature
When using the ${license.git.copyrightCreationYear} feature on a large repo it can take quite a while to perform the checks (or format the headers).

Can we have an configuration option ...

<ignoreGitCopyrightCreationYear>true</ignoreGitCopyrightCreationYear>

This should default to false.

This would allow for 2 profiles ...

  • -Plicense-update to update/format the headers and perform the slow git lookups.
  • -Plicense-check to check the license headers, without the git lookup step, basically ignoring the year (or at least ensuring the year is within the range of inceptionYear and currentYear)

Hi @joakime yes this is a major known downside at the moment and even worse in recent versions that now read all the history if I'm not mistaken. We need to get caching into place on this repo so the files are hashed and it doesn't try to even look unless file hash changed. There are a few plugins like formatter-maven-plugin and impsort-maven-plugin that do similar now. This project though is more complicated by fact of using git in that case or even svn where the hits are additionally against git. So even with a caching its likely to be still slow as too much involved there. Your idea to just outright skip it if some form of ignore the date might be highly beneficial as I'm sure you know, 99% of the time this does nothing unless some file changed. So I think at a minimum we introduce a cache which is fairly straight forwards to make sure we don't even bother with files not touched. Given most will not check that cache in, its usage varies (ie target folder would store the cache so any 'clean' would remove it). Its possible to store it too but that gives rather mixed user experience. It would however speed up incremental builds though when just running without clean so lots of use cases.

Anything I can do there is a bit more complicated. I don't have access rights to directly merge my own PRs so any effort I do ends up taking some time to get reviewed. So if there is anything you can additionally provide, that would help speed up time to getting applied for sure. Even with my suggestion we need a cache first and foremost, I think still need some ability to skip as this plugin has historically been notorious for miscalculations into new year depending on what was done with git that could cause it to spam entire repo update with new year unexpectedly and having ability to disable that would be a good use case. I have not seen this specific issue happen this year though so maybe long fixed but also as new year hits, its highly frequent to get spammed with missed updates anyways until either a full force application or files fall off the new year hit.

I was wondering, can the header template include regex?
That would be sufficient as a workaround as well.

@hazendaz : there could be a simple way to achieve this I think...

Right now, it is possible to add some placeholders in templates where the value will come from <defaultProperties> or <properties> in the plugin declaration in the pom.

In the git plugin, we could add a check: for each git property like license.git.copyrightCreationYear (but also others), if we see them already defined through the project properties, then we don't compute them.

This sort of override would allow someone to even call the plugin with CLI by passing a system property to override year, author, etc.

This fix would be very easy to do.

@joakime : for now, you could do what others are doing maybe: using a shallow clone. If you don't use the years then it might work. It's a way to limit the depth of a repo, at the expense of loosing some information from older commits like year, author, etc.

@joakime : another option too is to use license.git.maxCommitsLookup to limit the number of commits to go through in case you cannot use a shallow clone. But if you can, I really encourage you to use shallow clones in your CI : it will just speed up the clone process a lot for a big repo.

commented

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

We've moved on and are no longer using the ${license.git.copyrightCreationYear} as the check on our repo was taking over 4 hours.
This issue is no longer relevant to us.