dkandalov/code-history-mining

Code History Mining IntelliJ Plugin

This is a plugin for IntelliJ IDEs to visualize project source code history. Analysis is based on file-level changes and therefore programming language-agnostic. You can install it from IDE Settings -> Plugins or download from plugin repository.

Some examples of code history visualizations: JUnit, TestNG, Cucumber, Scala, Clojure, Kotlin, Groovy, CoffeeScript, Go, Erlang, Maven, Gradle, Ruby, Ruby on Rails, Node.js, GWT, jQuery, Bootstrap, Aeron, GHC, IntelliJ . Csv files with VCS data for the above visualizations are available on google drive.

See also code history miner (web server and CLI application with functionality of this plugin).

Why?

There is a lot of interesting data captured in version control systems, yet we rarely look into it. This is an attempt to make analysis of project code history easy enough so that it can be done regularly.

How to use?

Grab project history from version control into csv file. Grab Project History action will use VCS roots configured in current project for checked out VCS branches. The main reason for separate grabbing step is that code history often contains some noise (e.g. automatically updated build system files). Having code history in csv file should make it easier to process it with some scripts before visualization.
Visualize code history from csv file. At this step code history is consumed from csv file and visualized in browser. All visualizations are self-contained one file html pages so that they can be saved and shared without external dependencies.
Filter/process data. The purpose of filtering is to clean grabbed data so that visualization or other analysis is more accurate, e.g. you might want to exclude commits related to project documentation or commits generated by CI. It also might be useful to write custom analysis on grabbed data (similar to writing database query).

Grab Project history

Use Main menu -> VCS -> Code History Mining or alt+shift+H to open plugin popup and choose Grab Project History action.

You should see this window:

From/To - desired date range to be grabbed from VCS. Commits are loaded from version control only if they are not already in csv file.
Save to - csv file to save history to.
Grab history on VCS update - grab history on update from VCS (but not more often than once a day). This is useful to grab history in small chunks so that when you run visualization grabbed history is already up-to-date.
Grab change size in lines/characters and amount of TODOs - grab amount of lines and characters before/after commit and size of change. This is used by some of visualizations and is optional. Note that it requires loading file content and can slow down grabbing history and IDE responsiveness.

Visualize

Use Main menu -> VCS -> Code History Mining or alt+shift+H to open plugin popup, select one of the grabbed files and choose visualization from sub-menu:

By default cvs files with history are saved to "plugins folder/code-history-mining" folder. Files from this folder are displayed in plugin menu.

When opened in browser visualizations will have help button with short description, e.g. see visualizations for JUnit.

Filter/process data

Use Main menu -> VCS -> Code History Mining or alt+shift+H to open plugin popup, select one the grabbed files and choose Open Script Editor. This will open new tab where you can write Groovy code. To run the script use alt+shift+E shortcut (or Run Code History Script in editor context menu).

For details see Code History Script API wiki page.

The script is general purpose Groovy code with few implicit variables to access grabbed data and no particular restrictions (similar to LivePlugin).

Misc notes

any VCS supported by IntelliJ should work (tested with svn/git/hg)
merged commits are grabbed with date and author of the original commit, merge commit itself is skipped
visualisations use SVG and require browser with SVG support (any not outdated browser)
some of visualisations might be slow for long history of a big project (e.g. building treemap view of commits for project with 1M LOC for 10 years might take forever). In this case, filtering or splitting history into smaller chunks can help.

Code history csv format

Each commit is broken down into several lines. One line corresponds to one file changed in commit. Commits are stored ordered by time from present to past. For example two commits from JUnit csv:

2001-10-02 20:38:22 +0100,0bb3dfe2939cc214ee5e77556a48d4aea9c6396a,kbeck,,IMoney.java,,/junit/samples/money,MODIFIED,Cleaning up MoneyBag construction,38,42,4,0,0,817,888,71,0,0,0,0
2001-10-02 20:38:22 +0100,0bb3dfe2939cc214ee5e77556a48d4aea9c6396a,kbeck,,Money.java,,/junit/samples/money,MODIFIED,Cleaning up MoneyBag construction,70,73,3,1,0,1595,1684,86,32,0,0,0
2001-10-02 20:38:22 +0100,0bb3dfe2939cc214ee5e77556a48d4aea9c6396a,kbeck,,MoneyBag.java,,/junit/samples/money,MODIFIED,Cleaning up MoneyBag construction,140,131,8,4,23,3721,3594,214,154,511,0,0
2001-10-02 20:38:22 +0100,0bb3dfe2939cc214ee5e77556a48d4aea9c6396a,kbeck,,MoneyTest.java,,/junit/samples/money,MODIFIED,Cleaning up MoneyBag construction,156,141,0,34,0,5187,4785,0,1594,0,0,0
2001-07-09 23:51:53 +0100,ce0bb8f59ea7de1ac3bb4f678f7ddf84fe9388ed,egamma,,.classpath,,,ADDED,added .classpath for eclipse,0,6,6,0,0,0,240,240,0,0,0,0
2001-07-09 23:51:53 +0100,ce0bb8f59ea7de1ac3bb4f678f7ddf84fe9388ed,egamma,,.vcm_meta,,,MODIFIED,added .classpath for eclipse,6,7,1,0,0,199,221,21,0,0,0,0

Columns:

commitTime - in yyyy-MM-dd HH:mm:ss Z format with local timezone (see javadoc for details).
revision - unique commit id, format depends on VCS.
author - committer name from VCS.
fileNameBefore - file name before change, empty if file was added or name didn't change.
fileName - file name after change, empty if file was deleted.
packageNameBefore - file path before change, empty if file was added, path didn't change or file is in root folder.
packageName - file path after change, empty if files was deleted or is in root folder.
fileChangeType - ADDED, MODIFIED, MOVED or DELETED. Renamed or moved files are MOVED even if file content has changed.
commitMessage - commit message, new line breaks are replaced with \\n.
linesBefore - number of lines in file before change; -1 if file is binary or Grab change size checkbox is not selected in Grab Project History dialog; -2 if file is too big for IntelliJ to diff.
linesAfter - similar to the above.
other before/after columns - similar to the above, should be self-explanatory.

Output csv format should be compatible with RFC4180.

Credits

inspired by Michael Feathers workshop and Delta Flora project.
all visualizations are based on awesome d3.js examples.

Similar projects

https://github.com/adamtornhill/code-maat for any language
https://github.com/michaelfeathers/delta-flora for Ruby (with commit breakdown to method level)

dkandalov / code-history-mining