mathieucarbou / license-maven-plugin

Manage license headers in your source files

Home Page:https://oss.carbou.me/license-maven-plugin/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add caching support - two tiered - pie in the sky...discussion only for now.

hazendaz opened this issue · comments

This is mainly for me and to start a discussion on this topic. Not enough has been done to validate how to accomplish this yet but this is where my head is currently.

This is just a really loose discussion here as I have not dived deep enough to see overall outside of belief that jgit is the main problem that would necessitate a cache. And so many recent changes around how jgit performs this may have lessoned the pain which needs further review.

Using 'formatter-maven-plugin' as a base for caching support which I had mostly retooled over last few years (properties based, sorted file entry for both windows and *nix style, no timestamp). It runs blazenly fast and used by products like camel for speed which are massive.

Today the license plugin is rather slow especially when git is involved (probably true of svn as well). A cache would help given that most of the time nothing is going to happen. Currently this plugin takes as much time in profiled builds as surefire does and many times takes longer in sampled projects such as 'waffle'. (note I work on a dev ops team with a super pom running well over 500 different maven solutions on that super pom and licensing so profiling aside this has been demonstrated as a problem for years now). When comparing results to that of formatter maven plugin which does far more inside a file in general, it barely hits the radar due to the caching.

How how would be implement the caching. Its far more complicated here. Jgit calls are kind of slow. So I'm thinking two cache files to make up the need. The basic cache could be rather as-is from formatter maven plugin (few methods added). It works by reading the file, doing the work, and then hashing that file and checking the property entry for a match on the hash. If the hash does not match, it does not write. For the more expensive items that svn/jgit would facilitate in reading history, a second cache of the elements gathered from there per file could be used then if procedding with writing is expected. For example, the file is detected to have changed, ok so we want to run jgit to get the timestamps to use for that file, use the cache that has that already. If the first hash already changed you know last edit to touch the hearer was most likely current year (may be off when year flips). Really rough here on idea on scope...and maybe second cache isn't even necessary. The only time we really want to touch the files if we realize we changed configuration or the file was written by anything else causing the hash to mismatch.

The caches used elsewhere I've seen thus far use the target folder for the cache and that can be configured to check in if needed. Checking it in is a bit more complicated but its trivial to blow away the cache on a forced run. Probably a property to ignore the cache and run fresh would be useful. For what most would use though and not save that, we are generally talking about a localized cache or CI system cache at time of process where license plugin typically would execute many times and thus the hit would be one time vs every time.

commented

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

this was more for a discussion than an issue. It can be closed for now as work has not progressed on it.