git-historical-blame

original problem: have a big PR in a huge, multi-team repo. unclear who to ask for review.

solution: find which devs have worked with the files that we modified in the PR, aggregate by their teams, and kindly ask for reviews.

current implementation & it's flaws

for every single file that has been modified since <committish>, will go thru the full history of the file (commits), will collect info (additions, deletions, author), will aggregate them and, if extra info is provided, will provide people-based and team-based statistics of the ownership of the files.

the heuristics could be improved a ton -- currently there's a lot of overlapping ownership since every commit etc., but for starters, this will do.

dependencies

git
node.js, tested with v12
- yarn

setup

git clone https://github.com/kiprasmel/git-historical-blame.git
# or:    git clone git@github.com:kiprasmel/git-historical-blame.git

cd git-historical-blame/

yarn

usage

# pre-process the git history.
# note - matters in which committish the repo is checked out
./git-historical-blame.ts <../path/to/repo> \
                          <committish-of-file-modification-begin=origin/master> \
                          "" \
                          <files,to,ignore>

# re-group from file-by-file to author-by-author
./group.ts

basics done, some .json{.csv} files will be generated. now, to enhance the data with team members, provide a json file with an array of teammate objects:

[
  {
    "fullname": "Kipras Melnikovas",
    "email": "kipras@kipras.org",
    "team": "Sigma"
  }
]

sidenote: see below [1] for quick scripts to transform your data if you have it in a different format &/ multiple files.

note: obviously, best results will be achieved if the teammates' emails & names match with those they provided in their ~/.gitconfig. see findMatchingTeam in teamify.ts. currently, there's no logic for duplicate merging, or even more advanced things like Levenshtein/edit distance to compare the names/email addresses, but those can be added in the future / by yourself.

once ready, use the data like so (depends on previous scripts above):

./teamify <../path/to/teams.json>

this will produce multiple files:

teamified.json{.csv} - same as grouped.json above, but adds the team to a person when it matches
by-team.json{.csv} - same as previous, but also grouped by the team
team-stats.json{.csv} - aggregate statistics per team.

the 2 (3) most interesting files will be team-stats.csv, and by-team.json.csv (by-team.json for exact details).

[1] example of quick scripts for combining the data from multiple files & transforming into wanted form:

combine.js:

#!/usr/bin/env node

const fs = require("fs")

function combine({
	filepaths,
	combined,
}) {
	let jsons = []
	for (const f of filepaths) {
		const json = JSON.parse(fs.readFileSync(f, { encoding: "utf-8" }))
		jsons.push(json)
	}

	jsons = jsons.flat()
	fs.writeFileSync(combined, JSON.stringify(jsons, null, 2), { encoding: "utf-8" })
}

if (!module.parent) {
	process.argv.splice(0, 2)
	const filepaths = (process.argv[0] || "").split(",")
	const combined = (process.argv[1] || "combined.json")

	combine({
		filepaths,
		combined
	})
}

teamify-prep.js:

#!/usr/bin/env node

const fs = require("fs")

function teamifyPrep() {
	const json = JSON.parse(fs.readFileSync("combined.json", { encoding: "utf-8" }))

	const newJson = json.map(t => ({
			fullname: t.full_name,
			email: t.email,
			team: t.tribe_name,
		})
	)

	fs.writeFileSync("teams.json", JSON.stringify(newJson, null, 2), { encoding: "utf-8" })
}

if (!module.parent) {
	teamifyPrep()
}

kiprasmel / git-historical-blame