mathieucarbou / license-maven-plugin

Manage license headers in your source files

Home Page:https://oss.carbou.me/license-maven-plugin/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

File creation year is inaccurate by default

dbwiddis opened this issue · comments

I've been running rc2 for a while, using the copyright existence years implemented by #240, and it has been showing somewhat odd behavior, changing license headers on unrelated commits. The behavior was similar to what I saw when the repository was shallow (addressed by #252) but I confirmed full depth. But it's still showing wrong date ranges.

I finally dug into the code and saw that the default commits count (for lookback) was set to 10. This is problematic for calculating creation year, which simply walks from the oldest commit until it runs out of commits to check. But "oldest" is set by the commit history depth (similar to shallow checkouts) where it filters whole tree by commit number and then sorts it.

As noted in adobe/S3Mock#241 (comment) it is possible to add a property for larger histories:

<license.git.copyrightLastYearMaxCommitsLookup>100000</license.git.copyrightLastYearMaxCommitsLookup>

Some issues with this:

  • It's poorly named, implying it applies only to the "Last Year" while in fact it applies to the creation year
  • There's no documentation that creation year is broken (in the same way a shallow clone breaks it) with the default
  • I can't see where in my pom to put that; it doesn't seem to work in <properties> or <configuration> for the license plugin. It does work for me on the command line with -Dlicense.git.copyrightLastYearMaxCommitsLookup=100000.

I looked into potential fixes for this.

First note: this was originally reported by @brenuart in #151 but the issue was closed because there was a workaround available (per my comments above) and the issue went stale without being addressed.

Second note: this is a git-only issue. The copyright range for SVN uses the inception year. Since original commit author, email, or year is not returned by the substitutions, the warning just put in by #253 for sparse SVN history actually doesn' t matter. This seems a bit inconsistent and is outside the scope of this Issue, but I thought I'd mention it. There clearly needs to be better documentation of these features.

I think there are some potential fixes.

Happy to submit a PR, but would like @mathieucarbou or other committer to identify which direction to go. My preference is option 3 or 1.

Option 1: Simply change GitLookup.DEFAULT_COMMITS_COUNT to Integer.MAX_VALUE.

  • The existing default is an implementation detail and is not a documented default.
  • The existing override will be unaffected, so users who choose a particular value will keep that value.
  • The only impact changing this value is potentially performance related.
  • Users who want better performance involving git history probably are using shallow clones for that. #253 gives a warning for this case (with the same symptoms as a shallow clone with depth=10) which can be overridden.

Option 2: Do nothing, but clearly document that when the copyright creation year, copyright creation author, or copyright creation email keys (or the existence years which includes the creation year), that the copyright last year commit history setting is relevant.

  • This is confusing because of the naming and requires a lot of documentation and explanation why we didn't just create a new property.

Option 3: Add a new parameter/property license.git.copyrightCreationMaxCommitsLookup which defaults to Integer.MAX_VALUE (effectively "infinity") and deprecate the old (misnamed) property, respecting the old property value when set but issuing a log warning to users to use the new property.

  • This is essentially option 1 with a better named override and a log warning for anyone currently working around the broken behavior to know how to fix it the right way (or delete the override).

Option 4: Leave the existing default, but add overloaded getGitLookup() methods which allow the internal code which needs a full history to override the default 10.

  • Presently all the existing code which instantiates GitLookup would require this override anyway
  • It's set up as a lazy initialization and can't be changed, so if we wanted two different versions we'd have to store two (or more) different lazily initialized objects, with the thread safety complexity that involves. I see no reason to introduce this complexity.