[RFC] Best Practices for Naming and Merging PRs

Question

[RFC] Best Practices for Naming and Merging PRs

admbtlr opened this issue 4 years ago · comments

Best Practices for Naming and Merging PRs

TL;DR

This RFC was accepted and has been our modus operandi for naming and merging PRs since then. It boils down to three things:

Use the semantic commit message format for the title of your PR
When merging PRs, choose Squash and merge (unless you have good reason not to)
But don't use Squash and merge on deploy PRs!

Proposal

The git log can be a clear and powerful record of the work that has been undertaken within a repo, and as such can be a vital resource. Or it can be a spaghetti junction box of noise and cruft that requires some serious sed fu to yield anything useful. I prefer the former, and propose some simple best practices to make it happen, while also allowing for exceptions in cases where developers make extra effort to avoid crufty spaghetti. These best practices are:

A PR does one and only one thing
A PR's title should use the semantic commit message format
If you want the individual commits in your PR to be persisted in the main branch, you must specify this in the PR description so that it can be taken into account in the code review
If you want the individual commits in your PR to be persisted in the main branch, they should also use the semantic commit message format
It doesn't really matter whether you assign yourself or someone else to the PR. What matters is that the assignee is the one and only person who is responsible for merging it.
Unless it's your PR and it has been approved by someone who knows that you intend to do otherwise, squash the commits

Reasoning

Here's a recent chunk of the output of git log in force, (please note that I'm not saying this is anyone's fault! I just want to point out that this is how it looks if you don't prioritise keeping it tidy):

| * | | 4490a1d40 Complete desktop header notifications
| * | |   2b9f24686 Merge branch 'master' into desktop-header-notif
| |\ \ \
| * | | | 764897b93 Use new logged in actions
| * | | |   bd7e8b0b6 Merge branch 'master' into desktop-header-notif
| |\ \ \ \
| * | | | | 9ea1d0951 paired with xxx, created new component to hold logged in items
* | | | | |   00e44f18a Merge pull request #xxx from xxx/close-expanded-input
|\ \ \ \ \ \
| |_|_|_|_|/
|/| | | | |
| * | | | | 54aa4fab7 added ts-ignore
| * | | | | e5f6217c5 text area autoresizes on message send
* | | | | |   6eb5bb8de Merge pull request #xxx from xxx/inbox-text-formatting

By my reckoning there are about three lines in that log that tell me what's been happening in the force repo; the rest of it is noise and spaghetti. As a comparison, here's the git log on Aether, an open source project for data curation from the last place I worked, where we put a lot of thought into our git workflows:

* aad07e79 (HEAD -> develop, origin/develop, origin/HEAD) chore: upgrade dependencies (#881)
* 3516d97b fix(extractor): generate unique ids per mapping (#879)
* 86314e16 docs: python 3.8 [ci-skip]
* 83650a84 chore: upgrade dependencies (#878)
* d9bbb746 fix(producer): remove "callable" warnings (#876)
* 2f47fe41 chore: upgrade dependencies (#874)
* bbc417f3 fix: empty realm (#871)
* dda3bed6 fix: the presence of artifacts without realms were causing an issue when connecting via database (#870)
* 79c4b564 fix: limit producer concurrency (#867)
* eefed57c refactor: move producer to aether.producer (#865)
* 50c08b42 docs: typo (#866)
* f457fd2c test: set dynamic priority to task execution (#862)
* 0e873fe4 chore: LOGGING_LEVEL and named volumes in docker (#861)

To my mind, this is orders of magnitude more useful: I can quickly and easily see what's been happening in the repo. Generating release notes becomes trivial – in fact the git log is itself an ongoing, real-time set of change notes.

A specific example of the generalised motivation for this RFC is something that happened to me a few weeks ago: I had just finished a new feature on Force, and I was doing a final merge of master into my working branch before creating a PR – at which point I discovered that some new commits had completely refactored a lot of the lines of code that I had just worked on. In order to do a sensible merge, I had to try and figure out the motivation for the upstream changes so that I could work out how many adjustments to make to my code, and how many to make to the incoming code. I always have git blame displayed for the current line, so it's trivial to see info about the last commit that touched each line. But it turned out that the title of the commit in question was... Merge master. To find out the purpose of the PR of which this Merge master commit was a constituent part required me to untangle some git log spaghetti, and then refer to the closed pull requests in Github.

Imagine how much easier the process would have been if the commit message was a description of the actual PR, informing me that it was, say, a refactor and not a feature; knowing this fact would have made it easier for me to handcraft a reasonable merge. Now try to imagine a situation in which it would be useful to know that the author of that PR merged master into their branch at that point in time. I can't think of one. Of course that developer was right to record that particular action in their branch's commit log – but in the context of the main branch, it was just unnecessary obfuscation.

This crufty spaghetti has been a constant problem in every place I've worked at since git became the dominant version control system, and the only way that I have seen the situation improve is by introducing the best practices I'm outlining here. Git is one of the most powerful and graceful pieces of software ever created, but it isn't automatically perfect for every use case; in order to really leverage that power and grace, you need to introduce some consistent practices.

A little more detail on some of the best practices listed above:

A PR should only do one thing. This is, I hope, non-controversial. A single PR should represent a single feature, bugfix, refactor, config change, etc. Now admittedly, if I'm working on a feature and I find a bug that needs fixing, I find it hard to resist the temptation to just fix it and commit it on the feature branch I'm working on. This is easier for me, but liable to cause problems for the codebase, the git log and everyone who interactts with either of them. The right thing to do is to reverse out of where I am, fix the bug in a separate branch and open a PR, and then come back to the feature I was working on. If I get lazy, or just forget, then I would hope that a code reviewer picks me up on it and asks me to cherry pick the offending commit(s) to a separate branch and open a separate PR.
PR titles should use the semantic commit message format. These kind of conventions ultimately do double duty: not only does the reader of the title need to think less in order to understand it, but the author needs to think less when writing it too. Luckily we already have a convention that is available and in widespread usage: semantic commit messages (SCM). The important part here is that we only enforce SCM formatting on entries that will end up in the git log, so in the case of a developer who is not artisanally crafting their individual commits, this means that it is the PR title that uses SCM. This kind of standardisation also makes it easy to do all kinds of cool automation, e.g. a fastlane script that is triggered by a commit to the main branch can decide whether or not to build a new beta release, depending on what kind of commit it was. Semantic commit messages also make semantic versioning really easy.
When merging a PR, (almost) always squash the commits. This has been discussed at great length in an earlier RFC. The many great comments in that PR have informed the Exceptions section below. In general though, Squash and merge has two advantages: it gets rid of those Merge pull request commit messages, and it removes the spaghetti: it hides the how behind the what.

Philosophical Aside: PRs as Git Molecules

Note that this represents a shift in git usage away from being commit-focused towards being PR-focused. I like to think of this as being "molecular" as opposed to "atomic". You hear about the idea of "atomic commits" in git usage, which generally means that you should commit often, in small chunks of work. This is good practice for many reasons, not least of which because it allows you to undo, interleave, and generally time travel within the work that you're doing - i.e. workflow considerations. But workflow considerations are only important when you're working. Once the work is done, there is usually no real need to keep these atoms around. It's more useful to group them into a molecule.

Atoms are interesting for understanding how stuff works, but if you want to see what is really happening, you're much better off studying things at a molecular level. Once the work is done, reviewed and merged, I am almost always only interested in the "what".

Pragmatic Aside: But This Makes `git bisect` Useless!

git bisect can be a great way to home in on a specific commit that introduced a bug, side effect or other unintended/undocumented feature. One criticism of the approach I'm outlining here is that it breaks git bisect, because the individual commits become too unwieldy, and you are forced to go through diffs manually to find the culpable change. This is partially true, but I would argue that:

This manual step is usually not that arduous. If the PR really is huge, then it should probably be manually squashed into a few smaller commits.
Knowing the context within which the bug was introduced makes it a lot easier to fix. Maybe I find the offending line using git bisect - what do I do now? Do I comment it out? Do I refactor it? Having the entire PR in the same commit – focusing on the "what" instead of the "how" – makes it a lot easier to decide on the best approach.
The thing that really breaks git bisect though is when individual commits leave the codebase in a non-compilable, non-executable or test-breaking state. Not every repo has post-commit hooks that check for this, and so non-working commits will often end up in a PR's history - but they definitely shouldn't end up in the main branch's history. Squashing by default solves this problem.

Exceptions

This shift from "how" to "what" is of course not uncontroversial. I have talked to various people who maintain that git logs can provide huge value by allowing others to see how someone went about implementing a particular piece of functionality (or fixing a gnarly bug). This is without doubt a valuable approach that transforms the log into a powerful learning tool. My contention here would be that although this is hugely useful in the right context (what I call an "exemplary implementation"; a great example would be @roop's recent NextJs experiment), you can only ever be working in such a context some of the time. On the other hand, at Artsy we are working in a product-focused context 100% of the time. When the common use case is so much more common than other, rarer (nonetheless valuable!) use cases, it makes sense to instil a set of practices that are aimed at making that common use case smoother, easier and more effective.

In order to address the question of exemplary implementations that can be used for educational purposes – or that are so complex as to merit more detailed log entries – I would propose one of the following two solutions:

The closed PR lives on the repo (and is linked to somewhere), its branch undeleted, with its original commit log intact so that others can study the code within that context and learn about the implementation details
For developers who find these practices too restrictive and prefer to tidy up a PR's commit log using interactive rebasing, it should be acceptable to rebase their artisanally crafted commits. This approach might also be preferred in the case of very large PRs.

If the second solution is preferred, I would propose that the author of the PR is required to:

Mention this in the PR description, so that reviewers know that they should also be reviewing the structure and descriptions of the commits themselves, as well as the code
Assign themselves, so that they are the ones who will ultimately merge the approved PR. The upshot of this is the simple rule that you always squash commits, unless it's your own PR and you have specified otherwise.
Include the PR number in the titles of each of the commits (in the same format that is used automatically when squashing commits), so that it's easy to see that they're grouped together, and it's easy to dig out the context of the PR.

Additional context

There have been several RFCs on this and adjacent topics:

I can see from the many great comments in Eloy's RFC in particular that this is a controversial topic, and I would encourage people to read them - his responses are far more intelligently argued than mine would be. I also note that his RFC had many more +1s than it did dissenting comments, but that it was never resolved. I'm guessing that this is purely because he left Artsy before getting round to resolving it.

But I feel like – as a comparative outsider – the current state of Artsy git logs suboptimal and introduces friction in day-to-day work. I'd therefore like to try and reach some consensus on a coherent approach to improve this situation, even if it's not necessarily the one I've outlined here.

And a final point: I think that this consensus and resolution needs to happen on an org-wide basis, rather than leaving the decision to individual teams. We have repos that are touched by multiple teams on a daily basis, and I wouldn't want to force on people the congnitive overhead of thinking "Is this a semantic commit message repo? Should I squash by default?"

How is this RFC resolved?

By agreeing on a set of best practices and then socialising them. This would mean sharing the documented best practices, adding them to the main README, possibly a lunch and learn session. Based on previous experience, I would favour randomly dropping in on open PRs to see whether people are adhering to the best practices, and nudging them if not. Hopefully this could be done in such a way as to not be too surveillance-y.

Alternatively – or additionally – it might be possible to add rules to Peril that enforce some of these practices, e.g. checking the format of PR titles.

Resolution

This RFC was accepted and has been our modus operandi for naming and merging PRs since then. It boils down to three things:

Use the semantic commit message format for the title of your PR
When merging PRs, choose Squash and merge (unless you have good reason not to)
But don't use Squash and merge on deploy PRs!

Level of acceptance

3: Majority acceptance, with conflicting feedback

David Sheldrick · Answer 1 · Thu Aug 27 2020 18:06:42 GMT+0800 (China Standard Time)

A PR should only do one thing. This is, I hope, non-controversial. A single PR should represent a single feature, bugfix, refactor, config change, etc. Now admittedly, if I'm working on a feature and I find a bug that needs fixing, I find it hard to resist the temptation to just fix it and commit it on the feature branch I'm working on. This is easier for me, but liable to cause problems for the codebase, the git log and everyone who interactts with either of them. The right thing to do is to reverse out of where I am, fix the bug in a separate branch and open a PR, and then come back to the feature I was working on. If I get lazy, or just forget, then I would hope that a code reviewer picks me up on it and asks me to cherry pick the offending commit(s) to a separate branch and open a separate PR.

The idea that people doing more than one thing per PR causes other people problems down the line doesn't ring true for me. Can you explain more about why?

Adam Butler · Answer 2 · Thu Aug 27 2020 19:09:41 GMT+0800 (China Standard Time)

The idea that people doing more than one thing per PR causes other people problems down the line doesn't ring true for me. Can you explain more about why?

Well firstly there's just a general separation of concerns issue about adding, say, an unrelated bugfix in the middle of a new feature. Secondly, it can cause confusion in the code review process, and probably requires some additional explanation, or at the very least some extra cognitive effort from the reviewer in order to understand that some portion of the diff is unrelated to the rest.

And then finally, if you accept my whole philosophical take on treating PRs as first class citizens within the git history, then it becomes standard practice to cherry pick features, fixes, etc. between branches as needed, just because it's so easy – the discrete pieces of functionality are like beads threaded on a string. But you can't do that if your squashed commits turn out to be sneakily polyvalent.

David Sheldrick · Answer 3 · Thu Aug 27 2020 19:23:46 GMT+0800 (China Standard Time)

Well firstly there's just a general separation of concerns issue about adding, say, an unrelated bugfix in the middle of a new feature. Secondly, it can cause confusion in the code review process, and probably requires some additional explanation, or at the very least some extra cognitive effort from the reviewer in order to understand that some portion of the diff is unrelated to the rest.

These seem like tiny costs compared to the developer's cost of stashing their context, creating a new PR, and asking for a separate review from someone else.

And then finally, if you accept my whole philosophical take on treating PRs as first class citizens within the git history, then it becomes standard practice to cherry pick features, fixes, etc. between branches as needed, just because it's so easy – the discrete pieces of functionality are like beads threaded on a string. But you can't do that if your squashed commits turn out to be sneakily polyvalent.

Yeah this rings true, but again it's not something that creates any real friction in my day-to-day work. Curious to hear whether other people feel it's a serious issue, but for me the cost:benefit ratio here seems very imbalanced.

Barry Hoggard · Answer 4 · Thu Aug 27 2020 22:43:46 GMT+0800 (China Standard Time)

I've seen plenty of large PRs in the past that combine refactoring with a feature or bug fix, which is really a problem for the reviewers.

Barry Hoggard · Answer 5 · Thu Aug 27 2020 22:44:34 GMT+0800 (China Standard Time)

Thank you, @admbtlr. This is a great write-up.

👍 to semantic commit messages

Erik · Answer 6 · Thu Aug 27 2020 23:25:21 GMT+0800 (China Standard Time)

This all seems like good practice and a great target (well, not clear why feature is abbreviated to feat but refactor at one character longer is not), but what does this mean for a work item that balloons and grows to encompass larger changes than initially estimated - say a feature that ultimately requires a refactor?
In many cases for me changes like this can come out of nowhere (say an import is changed across 2 dozen files) and the reverberations are tedious to separate after the fact. Should the author in a case like this rewrite their commits to separate branches and make separate blocking PRs? I've done or debated doing this several times lately, am always glad I did but depending on the changes wonder if it was a good use of my time.

Also agree about the squashes, but wish github had better ability to control default merge options per-branch since squashing on production deploys tends to cause problems.

TL;DR - how to apply this ideal to unanticipated work?

Ash Furrow · Answer 7 · Thu Aug 27 2020 23:59:38 GMT+0800 (China Standard Time)

I think that no matter what strategy we use, there is going to be inherent friction to using git: we can only control where the friction falls. Either we front-load that friction on the commit/PR authors, or put it onto the work of looking through git history. I think we should optimize for making commits and PRs easier, rather than optimizing for a clean git history. Let me explain why.

My biggest concern is that this would make it harder to do drive-by improvements to code while working something related. I highly value these improvements, as I've brought up before. If I see a typo in a comment, or a missing unit test, or some poorly-factored code, I want to fix it – as soon as possible. If our processes make it more difficult to contribute these improvements (through stringent requirements on PR's doing only one thing), then this will necessarily discourage engineers from making these improvements. Maybe I'm pessimistic, but I think that introducing even a small amount of friction would dramatically decrease these kinds of drive-by improvements.

Personally, I do quite a bit of spelunking in git history (mostly in Eigen+Emission) but I don't really feel the friction that the RFC describes. I'm a 👎 on introducing strict semantic requirements on PRs – the cost/benefit ratio seems too skewed.

However... I think there is still room to improve our git hygiene, like semantic PR titles or more use of rebasing/squashing. I would be more interested in helping to popularize these best practices, rather than introduce strict org-wide rules. Maybe once we clean up our habits first, stronger semantic rules around PRs might seem less arduous?

This RFC is a huge departure from our current practices, which I agree does need to be done across all our repos, so if we do go ahead with it I would recommend taking small steps (in line with our "incremental revolution" principle).

Christopher Pappas · Answer 8 · Fri Aug 28 2020 00:59:57 GMT+0800 (China Standard Time)

Really great RFC write-up @admbtlr.

My biggest concern is that this would make it harder to do drive-by improvements to code while working something related.

This is something that I'm constantly on the fence about, as someone who is both guilty of the mixed PR and as someone who is frequently tasked with reviewing similarly large mixed PRs.

There's a gradient here. Within a group of more senior engineers working on close-nit teams (like MX) there's a lot of trust in the author during the review process. Not every detail needs to be understood by the reviewer, and personal responsibility is expected. So drive-by stuff is often acceptable. I think this kind of process falls apart a bit when one encounters engineers of differing skill-levels or temperaments. Some reviewers feel that the personal responsibility falls on them to review / understand each line, that they serve as gate-keepers to the codebase and a line of defense against overly ambitious devs. Or, they don't feel comfortable reviewing multiple contexts at once, and therefore the entire review process degrades.

If one team (like MX) is comfortable and productive leaning into trust while another feels that more sensitivity is required, it would likely be to the detriment of both to apply a general best practice across working styles. Gravity is much different than Convection which is different than Eigen, just as Galleries is different than Consignments. RFC's like this would seem to fall under team working agreements, IMO.

All this being said, I think it's important to recognize that mixed PRs often create a burden on reviewers and teams. Some of my most productive coding sessions have come from periods of intense focus that touch a lot of different areas, and there's always an element of 🙏 required by virtue of the number of things touched during these sessions. Often times asignees, when faced with so many changes, just hit merge out of frustration (or faith), and the review process might as well have been skipped completely. When PRs do one thing and one thing only, this is never the case -- and this, by extension, improves git hygiene.

A good intermediate step would be to ensure that folks know how to do git rebase -i, and how to clean up their git history. A lot of folks simply don't know how, or never use this very powerful tool.

Joey Aghion · Answer 9 · Fri Aug 28 2020 05:16:45 GMT+0800 (China Standard Time)

Lots of this is very agreeable:

The current state of git logs are noisy or useless.
Each commit should represent a working software state, but typically does not.
We should resolve this engineering-wide rather than adopt team-specific norms (to encourage cross-team code review, collaboration, and staffing flexibility).
The assignee should be the individual clearly responsible for merging.

Some points I disagree about, or at least want to add some subtlety around:

Merge master into <working branch> commits are useless, but I find Merge pull request #6132 from <user>/<branch <description>-type commits quite useful.
Unrelated changes should not go into any single PR, but sometimes a single coherent merge/release point does depend on multiple step-wise changes, each one a working software state and [in my opinion] valid commit. E.g., one commit might fix a bug and of course add a regression test, but the fix might introduce new ugliness that warrants a refactor. In cases like that, I think separate commits are acceptable and even helpful.
I love the notion of commits as atoms and PRs as molecules. I'm just not convinced we need to scrub all atoms from the master branch. (Is it possible the list of PRs could be the more meaningful record you seek?) I'd be thrilled with the lesser goal of all commits actually being atoms, instead of the photons and quarks they sometimes are today.

These^ are all just preferences and I'm happy to suspend them for some first-hand experience with a new approach. These are some practical concerns though:

If I recall correctly, squash-merging a PR will be "remembered" in the Github UI, and will be pre-selected on later merges. This has interfered with our method of releasing via PRs (which depends on those commits from master and staging reaching the release branch).
Regarding whether to assign yourself or someone else, I think we should have a preference for assigning someone else, because that encourages PRs to be graceful (not need manual tending) and avoids an additional hand-off step (and lag, for the author to return to and merge the PR after approval). In cases where manual tending is necessary, it's acceptable to self-assign.

There are several behavior changes proposed here (PR titles, assignees, merge styles), and like others I wonder if we could approach them incrementally. Some of the expectations we might consider independently (in increasing order of contentiousness) are:

PRs should do only 1 thing (most basic)
PRs should be assigned to a single individual
On working branches, rebase on master rather than merging master in
Commits in master should each represent a working, internally consistent software state (ambitious, but accomplishable either by interactive rebasing or squashing)
Commits (or PRs) should be titled according to a strict format
PRs should be squash-merged

Jon Allured · Answer 10 · Fri Aug 28 2020 06:01:44 GMT+0800 (China Standard Time)

I just wanted to comment and follow up my thumbs down vote: I agree with a lot of this, but not all of it and hope there's some middle ground we can reach! I really do think overall this proposal is headed in the right direction, but as others have suggested, an incremental approach might get us there with less strain on the team.

The part of this proposal that I disagree with most strongly is that incremental commits should be squashed. I'll offer this one as a point of debate: artsy/convection#645. Do those 16 commits help or hurt this codebase? 🤔 🤷 💃

I'd also like to take this opportunity to Plus One ™️ the idea of a PR doing only one thing but suggest a different approach. I would love to see behavioral and structural changes separated in the PRs we open. This is not my idea, I got it from here:

https://railsconf.com/2020/video/kent-beck-tidy-first

This is a talk Beck did for RailsConf this year and it's all about separating the work we do into behavioral and structural PRs. I'd argue that this is the discipline we'd benefit from most, not taking SCM and applying it to PRs.

To round things out I'd like to highlight your insight that our git logs are really hard to work with. 😢 When I started at Artsy I was very surprised by this but didn't have enough confidence to say anything about it, so bravo for speaking up!! 🤗

Christopher Pappas · Answer 11 · Fri Aug 28 2020 06:29:46 GMT+0800 (China Standard Time)

(Related, this blog post also discusses the subject a bit but from a slightly different angle.)

David Sheldrick · Answer 12 · Fri Aug 28 2020 17:50:12 GMT+0800 (China Standard Time)

I want to follow up with further context about my thumbs-down on this too, since I only addressed one point earlier.

A PR's title should use the semantic commit message format

If you want the individual commits in your PR to be persisted in the main branch, you must specify this in the PR description so that it can be taken into account in the code review

If you want the individual commits in your PR to be persisted in the main branch, they should also use the semantic commit message format

Unless it's your PR and it has been approved by someone who knows that you intend to do otherwise, squash the commits

I'm fine with all of these. They're low-overhead processes that other folks say will improve their lives, so I'm down to try it out.

It doesn't really matter whether you assign yourself or someone else to the PR. What matters is that the assignee is the one and only person who is responsible for merging it.

Yeah I like this. Distributed responsibility often leads to inaction so I think having a single DRI for each PR is a great time-saver.

A PR does one and only one thing

This is a nice ideal but to me it seems wildly unpragmatic given how much time it would cost PR authors vs how much time it would save other engineers (if indeed it would, I'm not convinced). Especially when we're working remotely across different timezones. If PR understandability is the issue, there are great lower-overhead ways to address that, nicely outlined in Steve's blog post.

I get that people can have negative emotional responses when asked to review PRs that are large or complicated, even when the PR author provides comprehensive context for the changes. And that might be worth spending significant developer time to avoid, but we should explicitly frame the discussion around that if so.

Adam Butler · Answer 13 · Fri Aug 28 2020 21:39:59 GMT+0800 (China Standard Time)

Thanks for the comments everyone, I'm happy to see that this has stirred up some great discussion. Obviously I wasn't expecting people to just agree with everything and change their working style from one day to the next, but I'm glad to see that at least the problem resonates with other people, even if the solution doesn't!

Let me try and respond to some of the comments.

how to apply this ideal to unanticipated work?

This is a good question, and I can only answer that these are best practices, not bullet proof rules. In the end you have to make a value call on whether separating work into different PRs is a good use of your time or not. One thing I would say is that once you start working in this "PR-first" way, and it becomes factored into the code review process, you don't have to make that decision on your own. I've often seen situations where a reviewer says "Please move these commits to a separate PR", the author says "srsly?", a brief discussion ensues and consensus is reached.

I think there is still room to improve our git hygiene, like semantic PR titles or more use of rebasing/squashing. I would be more interested in helping to popularize these best practices, rather than introduce strict org-wide rules.

Yes, sure, I'm absolutely talking about best practices rather than strict rules, and I hope that's how it came across. What I've outlined above is a system that has worked well in other organisations, but it's not necessarily the only way, and it's not necessary that it – or something like it – needs to be imposed from one day to the next. What I have seen though, is that once people start working this way, they like it, and so it tends to self-perpetuate.

A good intermediate step would be to ensure that folks know how to do git rebase -i, and how to clean up their git history

Yes, that's true – but I would argue that what I'm proposing is actually considerably less arduous than interactively rebasing every PR. If manually cleaning up your git history was easy, then this wouldn't be necessary. So in some ways, what I'm proposing could be seen as an intermediate step to what you suggest 😄. But yeah, some knowledge sharing around the topic would definitely be a good idea.

I'm just not convinced we need to scrub all atoms from the master branch. (Is it possible the list of PRs could be the more meaningful record you seek?)

Hmmm, you may be right. If it were possible to view branches from both a commit and a PR perspective, then I would be very happy! But I don't think there is, or at least not without enforcing even more arduous rules that make merge commits into recognisable representations of the PR from which they are derived.

I'd be thrilled with the lesser goal of all commits actually being atoms, instead of the photons and quarks they sometimes are today.

Yes, absolutely! But it seems to me that ensuring this entails more work than what I am proposing, since it relies on everyone crafting their commits (probably interactive rebasing). I could well be wrong, but I feel like this is harder than following the practices that I propose.

If I recall correctly, squash-merging a PR will be "remembered" in the Github UI, and will be pre-selected on later merges. This has interfered with our method of releasing via PRs (which depends on those commits from master and staging reaching the release branch).

Yes, this is indeed a problem. I wish that Github was smarter about this. I also find it frustrating that, even if you merge without squashing, Github doesn't warn you (unless I'm missing something?) that it cannot do a fast-forward merge, which means that deploy branches get filled up with merge commits. I'm a bit of a simpleton here, but I always want deploy branches to exactly match the main branch when a release happens - which will only happen if the merges are fast-fowarded.

Do those 16 commits help or hurt this codebase? 🤔 🤷 💃

So here's my response to this, and I realise that you probably won't agree – which is absolutely fine! Progress is rarely made through immediate consensus.

Firstly, let me say that this is a very nicely organised PR, and I totally agree that you set it up for an easy review process, as per Steve's comment. And the fact that you cleaned up a lot of the tests as a side effect of the work you wanted to do is of course highly commendable. But... should it even have been one PR? You're actually doing two different things within this PR: cleaning up the tests, and then implementing the actual fix.

Now you might say, "So what? This code was all done together, it's not an unreasonable size to review in one go, and it doesn't require too much context to move between the various parts during the review". Which is all true, and represents a point of view that could be described as something like "Commits are what matter in the long run, PRs are just an ephemeral workflow aid".

But to come back to your question: I actually think that, in the long run, those 16 commits hurt the codebase.

Why? Because they're too detailed. Looking back in August 2021, what do I need to know? Do I need to know that you first introduced consts and then ran prettier? Or do I actually just need to know that you refactored the tests? Ultimately I think that having this level of granularity in the commit log puts you in a "can't see the forest for the trees" situation. Yes, when reviewing the PR the trees are important. But once the code has been approved? I'd be happier with two entries in the log: refactor: improved tests and fix: offer params should override defaults (or something similar).

(trees, forests, logs, branches: please excuse the metaphor car crash)

Fine-grained commits like this are a gift for the reviewer, but the git log exists in a different scale from a PR review. It's much less microscopic, and is concerned with the entire repo. Once you zoom out to that level, then your concerns exist on a different level too. At one point in time – the code review – we need a microscope, and then at a later point in time – going over the logs – we need maybe a magnifying glass. So how do we account for this difference in perspective? You already know what I'm going to say...

The nice thing about squashing commits is that it's a process that takes place at precisely the moment when the zoom level requirement changes, i.e. at the moment that the PR passes review and joins the rest of the repo. And, I would argue contra what others have said, it's actually very low friction, because it's done for you. All you need to do is think for a moment about how best to describe the PR, and git does the rest.

Joey Aghion · Answer 14 · Fri Aug 28 2020 22:55:43 GMT+0800 (China Standard Time)

It's a good point that squashing is a much more accessible step than expecting everyone to interactively rebase. I'd be very in favor of expecting much better commits in master, but accomplishing that either via squashing "bad" commits or retaining better ones, which is essentially what you're proposing.

My personal choice would be to make better commits and ask my PRs to not be squashed, so I'd just have to specify that. (I think this is closer to the "ideal," frankly, because commits retain their atomicity while PRs continue to represent natural merge+release points.)

Whether the commits themselves use a strict format, I don't feel strongly about (as long as they do adhere to the usual best practices). Given the feedback here, I'd suggest we leave that out for now.

Could we avoid the release-merging headaches via some rules or automation? I know I keep asking about this, but it is truly annoying and I've had to fix these issues dozens of times.

Ash Furrow · Answer 15 · Fri Aug 28 2020 23:28:18 GMT+0800 (China Standard Time)

You're actually doing two different things within this PR: cleaning up the tests, and then implementing the actual fix.

That's a fair perspective – my question is how could we structure two PRs in a way that makes them independent? To my mind, either you fix the bug with the old tests, and then follow up (the cleanup PR now depends on the fix PR getting merged first), or you clean the tests up first, and then follow up (and the fix PR now depends on the cleanup PR getting merged first). This slow down is a big cost, and this is the kind of friction that I mentioned above. I'm concerned this would lead engineers to just fix the bug and move on, but I could be missing a way to structure these two PRs in a way that doesn't make the inter-dependenant.

All that said, I don't think there is an easy definition of what makes code changes related to one another. I think that it's a spectrum of related-ness rather than a binary yes/no. Seeing these as two independent pieces of work is a valid perspective, but I would argue that cleaning up the tests and fixing the bug are very much the same piece of work. As Adam says, we need guidelines and principles 👍

Aside: I'm very glad that we're able to point to existing PRs because otherwise this discussion would be overly theoretical, and I think it really benefits from seeing how things would play out in practice!

Adam Butler · Answer 16 · Sat Aug 29 2020 00:15:09 GMT+0800 (China Standard Time)

It's a good point that squashing is a much more accessible step than expecting everyone to interactively rebase. I'd be very in favor of expecting much better commits in master, but accomplishing that either via squashing "bad" commits or retaining better ones, which is essentially what you're proposing.

Exactly, those are the two options: do it yourself, or let git handle it for you. The rest of the best practices are about setting git up for success if you're doing the latter. And for reasons of inclusivity (and for people like me who are either too lazy or too stupid to do it properly), the default option is to let git do it for you (i.e. squash).

Adam Butler · Answer 17 · Sat Aug 29 2020 00:17:23 GMT+0800 (China Standard Time)

Could we avoid the release-merging headaches via some rules or automation? I know I keep asking about this, but it is truly annoying and I've had to fix these issues dozens of times.

One thing that I've done in the past is to use tags as a deployment trigger. That way you don't need to have separate deploy branches, you can just add tags to the main branch, and let CI pick up the tag commits and act accordingly.

This assumes that your deploy branches don't need any extra info (special .env files or whatever) but in this era of K8s and Terraform, that's usually the case anyway.

Adam Butler · Answer 18 · Tue Sep 01 2020 17:19:37 GMT+0800 (China Standard Time)

I actually didn't realise that the semantic commit message format reappeared a couple of years ago as "conventional commits": https://www.conventionalcommits.org/

Most of what's in that spec chimes with what I've written in this RFC.

Christopher Pappas · Answer 19 · Wed Sep 02 2020 01:04:42 GMT+0800 (China Standard Time)

I actually didn't realise that the semantic commit message format reappeared a couple of years ago as "conventional commits"

Note that at one point we followed this pattern in Reaction (prior to that code being merged into Force and Reaction retired) but then replaced formal commit messages with labels via Auto. It made sense over there because there were actual releases happening which would then get merged into Force via Renovate, and the type of label determined the version number of the package released to NPM.

We don't currently update version numbers in Force but perhaps changing that would help correlate things.

Justin Bennett · Answer 20 · Thu Sep 03 2020 11:57:27 GMT+0800 (China Standard Time)

TL;DR Yep, I'm down. Skip to the end for important info.

A PR does one and only one thing

There's been a bit of contention on this point but to me it feels orthogonal to the overall conversation of naming and merging. PR doing one thing is a good best practice but use your best judgement. As an aside, here's a good article that talks about this both strategically and tactically: https://www.thedroidsonroids.com/blog/splitting-pull-request.

A PR's title should use the semantic commit message format

👍. Having a consistent PR title format that quickly communicates the scope of a PR seems like a great improvement. Good incremental step. Also a good place for peril automation to help keep us consistent.

If you want the individual commits in your PR to be persisted in the main branch, you must specify this in the PR description so that it can be taken into account in the code review

👍

If you want the individual commits in your PR to be persisted in the main branch, they should also use the semantic commit message format

Seems fine. A bit of an extra burden on those who do so (but it would keep things consistent which is swell). I'd recommend something like (commitlint)[https://commitlint.js.org/#/] to folks who would want to keep their commits.

It doesn't really matter whether you assign yourself or someone else to the PR. What matters is that the assignee is the one and only person who is responsible for merging it.

👍

Unless it's your PR and it has been approved by someone who knows that you intend to do otherwise, squash the commits

👍

There was a historical issue where GitHub didn't credit the author of the PR in a squash merge. The last time we were having this conversation that was an issue, but it seems to be fixed now. isaacs/github#1303
Release PRs are a catch. As @joeyAghion mentions, fixing those can be a bit of a PITA. Great time to break out the duck tape and chicken wire and see what we can jury-rig together. One approach would be to have peril automerge the staging branch when the assignee approves the PR. Then we'd use the approval process instead of pressing the merge button. Or even simpler just adding a release comment (still assuming you're the assignee). Future improvements to that process would be grand.

Adam Butler · Answer 21 · Thu Sep 03 2020 18:19:40 GMT+0800 (China Standard Time)

Thanks @zephraph. Some more thoughts:

A PR does one and only one thing

There's been a bit of contention on this point but to me it feels orthogonal to the overall conversation of naming and merging. PR doing one thing is a good best practice but use your best judgement.

Yes, fair enough. The motivation for this is achieving the right granularity of git log entries when squashing a PR's commits. If you have a PR that contains both a feature and a fix, and you squash the commits, you will end up with only one log entry, when it really ought to be two. If you're handling your commits manually (i.e. deliberately not squashing), this is not an issue (assuming that your separate commits have accurate titles).

If you want the individual commits in your PR to be persisted in the main branch, they should also use the semantic commit message format

Seems fine. A bit of an extra burden on those who do so (but it would keep things consistent which is swell). I'd recommend something like (commitlint)[https://commitlint.js.org/#/] to folks who would want to keep their commits.

Ah, that's a great idea!

There was a historical issue where GitHub didn't credit the author of the PR in a squash merge. The last time we were having this conversation that was an issue, but it seems to be fixed now. isaacs/github#1303

Right, true. Looking at the comments on that issue, it appears that this is the current (updated) behaviour:

the PR author becomes the squashed commit's author

all commit authors and co-authors are credited as co-authors on the squashed commit (using the co-authored-by

trailer in the (editable) commit message)

the merger is uncredited by default. If they would like credit they can add their name as a co-author to the commit message

the squashed commit's "committer" is GitHub (as it is for merge commits), verified with our GPG key

Which seems fine to me.

Release PRs are a catch.

Yup, I get the feeling that there are a number of improvements that could be made to the release/deploy process, but that's waaaaaaay outside the scope of this RFC.

Ash Furrow · Answer 22 · Fri Sep 04 2020 00:39:39 GMT+0800 (China Standard Time)

We used the MX Knowledge Share this morning to go through Adam's process for PRs and git hygiene. I found this hands-on discussion really illuminating and appreciated learning more about how someone else uses git.

I agree with Justin's summary above – including the idea of using something like commitlint. Maybe with Peril? I think breaking this RFC down into smaller goals of we want to change about our git hygiene is a great way to figure out next steps. Maybe an in-person discussion? Next week's L&L even?

Joey Aghion · Answer 23 · Fri Sep 04 2020 00:46:04 GMT+0800 (China Standard Time)

This discussion made me realize that, even though I talk like I interactively rebase often, I almost never do. Instead I often commit --amend to improve the most recent commit until tests pass and it's coherent. I highly recommend this as an easy way to go from lots of noisy commits to 1 or a few worthwhile commits per PR.

Pavlos Vinieratos · Answer 24 · Mon Sep 07 2020 20:07:16 GMT+0800 (China Standard Time)

For the record I'm very much 👎 on this RFC, for a few reasons. After reading all the above, it's definitely interesting to see what others think.

First, as an RFC it's great, very detailed and with good explanations. Now here are my opinions on the content. I went in order that these appear in the RFC, not by importance.

git squash should not be used. I would suggest the reverse, let's manually squash if we see we made a lot of commits in a branch that we could avoid if they don't offer much. But I like having the whole history of commits for a branch. That way I can see how it was edited, which order, what was tried, how it was fixed, what was split, etc. About the example of artsy/convection#645, I very much want to see the whole history of commits. If I want a top-level view, I can easily just scan the commits and look at a couple of just the PR name and description.

To add to this, making every PR into one commit breaks the whole point of git. We work on branches that get merged. Moving to a singular line of PR titles is not a good way to do this. It will make things harder. Harder to see the history. Harder to cherry-pick commits, harder to cherry-pick branches. Harder to git bisect. Harder to see the path of a branch.
"A PR should only do one thing.". Totally with you, but this is at most a soft suggestion. I am not going to do the PR dance for a small fix, I prefer including it, as long as it's related/dependant. Obviously, if it's very different things, they will be in separate PRs. But I will not tell anyone and I don't expect anyone to tell me to split something into 2 PRs unless it's obviously making reviewing impossible. Also, when we can't avoid big PRs, it's easy to do a review in a pair session if needed.
"PR titles should use the semantic commit message format". One thing I said above is that a PR should not translate to one commit (I'll write about this a bit more further down). The other thing is that I think semantic commits in general is not great. It can be helpful, yes. But mostly I feel it's pretty easy to ask and explain to devs to make better commit messages. I am far from the best commit message writer, but it's easy enough to have a commit like "fixed the image position in VR" and not "fixed" or "bla". I don't really care to read "fix: image position (VR)" or whatever the structure is. If I see some understandable text, great, if I see some gibberish, I assume it's a mid-point commit that doesn't really stand by itself, and that's fine. I know I won't go looking deep into that specific commit. Also, these are the commits that can be manually squashed by the commit author.

Specifically for the title of a PR on gh, I don't care much what it is, as long as it's something obvious for the people that care about it (author, reviewers). We have (in eigen) a nice small PR template. The title is there as a placeholder, it can be something normal, some codename, some joke, whatever, as long as it's connected to the content. I would not want a reviewer for my PR if they only read the title. We have a nice detailed PR body with all the info that is needed. That's the main thing, not the title.
"When merging a PR, (almost) always squash the commits". No, like I said above, I reeeeally am against this. It breaks git branching, it breaks git bisect, it destroys history, and it makes reading hard. I think the main thing you are trying to fix here is to make a nice simplified linear view of git log. I would be very happy to work with you or anyone else to make up a git command that will show you the nice simplified linear view, constructed from the current git repo situation. This is a "view", it's not how the "git data" should be laid out. With a view, we keep the right git way and the complete log, as well as have a simple view. If we move to squashing and commit-per-PR, then we get the simple view and lose eeeeverything else. The simple view becomes our git data, and I will fight hard to not lose that for all the reasons above (git bisect etc).

And as a last thing, as an also outsider that joined recently, I like the eigen git history. Maybe force is a bad example, and maybe force should be made cleaner, but I would take current eigen over a linear history of it any day of the week.

Adam Butler · Answer 25 · Fri Sep 11 2020 22:23:39 GMT+0800 (China Standard Time)

Thanks for your insightful comments @pvinis. I've extracted some of your arguments and written responses to them there. I very much hope that in doing so I haven't misrepresented your viewpoint - if so, please feel free to shout at me!

making every PR into one commit breaks the whole point of git...

I think that the point of git – and the thing that it is amazingly good at – is making it possible for multiple versions of a codebase to exist simultaneously, in a way that allows those versions to be merged and branched as desired. But git provides you with a huge set of tools, without necessarily dictating what you do with them. I'm outlining a systematic way of using that set of tools that – I would argue – creates an end result that is more impactful than the default, non-systematic methodology (hmmm, is "non-systematic methodology" an oxymoron?).

But I will not tell anyone and I don't expect anyone to tell me to split something into 2 PRs unless it's obviously making reviewing impossible. Also, when we can't avoid big PRs, it's easy to do a review in a pair session if needed.

This is not about the review process, it's about using the git log as a higher level view onto the work that has taken place within the repo. As mentioned, my contention is that the necessary zoom level changes once the code review process is complete. I'm thinking about someone looking through the git log in 3 months' time, a year's time: they don't care how you did it, they just want to quickly and easily understand what you did, so that they can either (a) understand the order in which coherent blocks of work were done, or (b) figure out the context within which a particular line or block of code was added/edited, so that they can then evaluate the potential impact of making changes to it.

Code maintenance is infinitely harder than code production, and I don't find it unreasonable to make a former a little easier by making the latter a little harder. And honestly, I don't think splitting PRs is that hard, most of the time. But I'm no zealot, so I'm happy to accept a little less rigour if people find it too arduous.

The other thing is that I think semantic commits in general is not great. It can be helpful, yes. But mostly I feel it's pretty easy to ask and explain to devs to make better commit messages.

This is exactly what this is about: writing better commit messages without having to think too much. You just start with a single word that says what this commit is. And by doing that, you make it easier for someone else to scan the log (as you describe).

As a side note, this is what good process is all about, imho: making it easy to do the right thing without having to actually think about it.

If I see some understandable text, great, if I see some gibberish, I assume it's a mid-point commit that doesn't really stand by itself, and that's fine. I know I won't go looking deep into that specific commit. Also, these are the commits that can be manually squashed by the commit author.

Again, this is about creeating a process that makes this stuff easy, makes it easy for devs to do something more like the right thing (impact over perfection, amirite?). Using a predefined, common format for a PR title and then pressing the squash button (or even automating that button press, as @ashfurrow has done) makes it very easy to improve git history – something that several people have already agreed is currently problematic here at Artsy.

We have a nice detailed PR body with all the info that is needed. That's the main thing, not the title.

I don't think it really matters what the title is at the code review stage, and I would hope that there is plenty of context in the PR description, the linked Jira ticket, etc. But everything in this RFC is about maintaining a high quality git log with a minimum of cognitive overhead, and this is just another part of that: if you squash a PR's commits, the default setting within Github is that the PR title becomes the commit title. If your PR title uses semantic/conventional format, then your resultant commit message will as well. So without much thought, you have created an entry in the git log that is formatted in a standardised manner, which means that it requuires less cognitive overhead to parse, and is therefore easily scannable.

It breaks git branching, it breaks git bisect, it destroys history, and it makes reading hard.

This is pretty much a list of the motivations for of the whole RFC, imho :). To tackle them one at a time:

It breaks git branching

If you mean this literally, then I would have to respectfully disagree: it's clearly not true. But if you mean somthing like "it negates the purpose of git branching", then I would argue that this is also not true. The purpose of git branching, imho, is to make it easier to work in parallel, and to have multiple versions of the codebase exist simultaneously. But once a PR has been merged, that branch is effectively dead (and usually deleted).

As I said in my reply to @jonallured, I think that pre-merge and post-merge are different situations: in pre-merge, you want to know about the how, so that you can review it. Post-merge, as I've said, I think that the context changes, and the "what" is waaaay more important than the "how". The code review effectively rubber-stamps the "how" and says "this is the right way to do it". Thereafter, I would argue that you can usually understand the "how" by looking at the diff (knowing that it has been reviewed and is therefore assured to be a sensible approach), and that the usual question you will be asking is "why did they do this?", i.e. "What was the purpose of this PR?" – which is why I maintain that squashing makes sense.

it breaks git bisect

This is only true insofar as it decreases the granularity at which git bisect functions. Which, admittedly, could be a problem – although in my experience it is usually outweighed by the advantages of having all of the relevant changes in the same commit. OTOH I would argue that this actually stops git bisect from breaking, because it leads to an assurance that every commit leaves the codebase in a working state. What really breaks git bisect is commits that don't function, so that you end up with a bisect commit that e.g. doesn't even build.

it destroys history, and it makes reading hard

The aim of this RFC is actually to make the history into something more useful and more easily readable. I guess it comes down to the question of what you're trying to get from the history. My contention is that the history is more useful if it works at a higher altitude, and provides a linear history of the work that has been done within the repo. Just because git has an incredibly effective branching mechanism, it doesn't mean that the history of those branches have to live forever in the logs: once the code is reviewed and the branches are deleted, it's more helpful imho if the relevant information zooms out to enable a bigger, less complex (i.e. linear) picture.

I would be very happy to work with you or anyone else to make up a git command that will show you the nice simplified linear view, constructed from the current git repo situation.

This would be amazing! I would love it if this were possible, but I'm not sure that it actually is, at least not without some best practice changes that are in some ways similar with what I've outlined here. However, I'm game! So the two purposes of a simplified view would be:

an easily scannable, linear view of what has happened in the repo, from the high-altitude, "what" perspective
an easy way to see which PR a particular commit comes from, and the purpose of that PR, so that I can find out why something was done (having that show in git blame so that I can quickly contextualise code line by line would be great too, but unfortunately that really won't be possible with this approach)

If you want to work together on satisfying those two requirements, let's do it!

Pavlos Vinieratos · Answer 26 · Mon Sep 14 2020 18:39:09 GMT+0800 (China Standard Time)

I'm thinking about someone looking through the git log in 3 months' time, a year's time: they don't care how you did it, they just want to quickly and easily understand what

I think I just work with git differently maybe. Every time I have looked in the history of a project I've worked for a while or I'm new to, I always like having the details of the work done.

And I still think if someone does want just the surface-level history, it probably can be done with a git log <plus a bunch of args>.

So without much thought, you have created an entry in the git log that is formatted

.. and at the same time lost all context and data of how this was done. Now it's just "something that was done in the code at some point by someone", with no more information. All the process and the actual way the change was done is lost forever. I really don't want this in the repos I work with.

About git bisect, I maintain the opinion that the "how" is as important as the "what", and the granularity of commits is needed, rather than crashing it down to a one-branch with PR-level commits. Having solved enough things using bisect, I want to keep the ability to do this. And about

so that you end up with a bisect commit that e.g. doesn't even build.

I am familiar with this. I think the work to make something like that build (which is some stashing and popping etc, maybe some manual copy-pasting) is preferable to me than trying to find the 2 lines of code that broke something in a PR-commit that builds but has a bug.

Lastly, I want to say is that I appreciate and want to simplify and streamline things in our dev work, but I think what this RFC is detailing is the wrong part of the dev work to try to do that.

(Ok actually lastly, I think the two things you listed at the bottom can be done. I think the first one we could even get just from GH. We can list all the PRs in merge order. This will give you the linear view you want. Maybe in that case the PR titles can become semantic, I'd be ok with just doing that change. The second one can be done if we combine the fact that the merge commit includes the PR number. So this could actually give us both something like which-pr <some sha> and it would tell you which PR it belongs to, as well as the linear list of the previous requirement. If we grab master and get all the merge commits, then replace the titles with the PR title from GH, all done.)

Anandaroop Roy · Answer 27 · Tue Sep 15 2020 00:19:55 GMT+0800 (China Standard Time)

I gave this a qualified 👍

In brief: I'd be fine with a more prescriptive approach to PR and commit authoring in exchange for gains in code reviewability, maintenance, and archaeology — just so long as the approach allows for artisanal committers to preserve their work in the log. I'm satisifed that this RFC does so.

To +1 a few points raised across the responses above:

I share the concern that a strict insistence on "a PR does one thing only" might discourage small fixes and tactical refactors by adding mental and logistical overhead to the review process, so I'd favor that as a heuristic rather than a requirement.
I think that this prescriptive approach would really only be viable with: lint tooling that makes it easier for the PR author to do the right thing; and GH automations that make it easier for the assignee to do the right thing. Otherwise I'd expect uptake of these suggestions to be spotty.
There's also the practical question of making this work with our release strategy, but I am inexpert on those matters.

Isac Petruzzi · Answer 28 · Tue Sep 15 2020 17:52:00 GMT+0800 (China Standard Time)

I'm very much in favor of adopting this RFC, even if we don't necessarily adhere to it for all cases I think it's a good north star for the following reasons:

I find myself often trying to do what the semantic commit message's approach aims to do in a PR (molecular) form, i.e. "You’ll never again be tempted to include a bug fix and a feature in the same commit" and when I don't it feels wrong. This is a way to formalize the process and would help me out in structuring my commit messages and PRs.
I find myself often pulling cruft and spaghetti out of git history when I make a Hokusai release and am in favor of squashing. The worst problems I have found with rebasing come with CI, i.e. if you rebase master it can't pull and deploy, but we don't have that anymore - CircleCI does a clean checkout and we deploy artifacts. The problem comes with Github and mergability to our master -> staging -> release pipeline (correct me if I am wrong here).
Coming to a new project I can see the pedagogical benefits of this approach - by curating a much clearer, succinct and grep-able history via semantic metadata over a project's changes.
I would go even further than what @joeyAghion wrote:

Merge master into commits are useless, but I find Merge pull request #6132 from /<branch -type commits quite useful.

I actually find Merge pull request #6132 from /<branch -type commits more bothersome in general and would be happy if we didn't have any merge commits in the mainline. Thinking especially in terms of a rollback or a bad release... thinking about trying to revert a merge commit and choose between two parents, rather than a squashed single commit that can be reverted cleanly. Perhaps this is due to the fact that I sort of retained git habits before github and forks and pull requests became the norm. My point is that merge commits are not essential, but they are useful in terms of being able to grep git history and the same can be accomplished with semantic commit messages.

I can understand how this isn't feasible sometimes and doesn't cover all cases. But I feel like it's a step in the right direction. In our infrastructure repos and orchestrating by means of Kubernetes config changes we often have to split a migration into two PRs. Take Ingress for example, one to introduce new resource(s), and one to clean up the old. Often there is unforeseen fallout and multiple commits that relate may end up interlaced with others. The semantic commit message may also help us here too.

Disclaimer: I have not used git-bisect much - I generally find that as @admbtlr states

Knowing the context within which the bug was introduced makes it a lot easier to fix

is enough to hone in on a bug, and having that context in mind when reverting the change that introduced the bug is helpful because it was more often than not introduced for a reason while not considering an edge case that happens to fall outside of test coverage, and simply reverting that atomic commit may break things elsewhere. IMHO git-bisect is a nice tool but find it's more productive to approach debugging via telemetry ( profiling ), logging or cracking open a debugger - often bugs are introduced in dependencies and you won't find them via git-bisect!

Isac Petruzzi · Answer 29 · Tue Sep 15 2020 19:29:04 GMT+0800 (China Standard Time)

I'm not sure I comprehend the points raised above about "friction". In practical terms of what this RFC is proposing, it comes to pausing for a few seconds and thinking about my commit messages to adopt semantic format (for example hotfix: kyoo and bugfix: kyu are quickly typed and pushed in the case of small commits) and I perceive the time taken to stop and think about what is being committed productive in its own right.

I would ask why is a frictionless workstream desirable? Taking the time to pause, zoom out to the molecular level and reason with yourself as you are committing is a chance to reflect on what changes you are making. And again this can be productive and I have saved myself from introducing cruft or introducing bugs at this step.

Presumably testing should cover this gap but how much more friction is really introduced by waiting on testing to run, fail or succeed?

devon blandin · Answer 30 · Fri Sep 25 2020 12:11:17 GMT+0800 (China Standard Time)

Thank you for leading a discussion of this RFC over Zoom this afternoon! I'll attempt to sync my thoughts back to the RFC here.

I'm generally 👍 on this RFC but would love to see more alignment on how we might roll this out in a way that maximizes flexibility and reduces friction.

Re: PRs doing one thing and semantic commit messages / PR titles

I generally agree with this but I don't think the boundaries are always so clear. For example, I personally don't mind a bit of refactoring within a feature PR.

I'm in favor of trialing the adoption of semantic/conventional messages for commits landing on the main/master branch but I'd prefer to the leave how of that to the PR author. If this proposal is primarily concerned with the cleanliness and usability of the git log, multiple commits in the same PR, as long as they're appropriately tagged/described using semantic/conventional guidelines, shouldn't detract from that goal.

An author could choose to apply a semantic commit message to the PR and squash merge or apply semantic commit messages to each commit in the PR and rebase merge.

Adoption Strategy

I like Ash's idea of using Peril and a commit linter to raise awareness.

I'm unsure of the best way to roll this out. On one hand, it's not very useful to have this partially implemented across the team. On the other hand, this will be a shift for many of us so I think it's prudent to start slow and increase familiarity before being too strict with enforcement either through code review or automation.

One potential strategy broken up into milestones:

Announce via Email / Slack intent to trial semantic/conventional commit messages across all repos
Have Peril raise awareness of the RFC outcome and semantic/conventional guidelines
All engineers have merged at least [N] PRs conforming with semantic/conventional commit messages by [date]
Reassess. Can we point to git logs which are positively impacted? Gather feedback from engineers on positive/negative impact of structuring commits using semantic/conventional guidelines.
Decide whether to fully adopt guidelines as engineering standard

Jira

There was a brief mention of Jira's ticket-linking integration during the Zoom meeting.

We support linking GitHub PRs to Jira tickets using our own Peril automation: #274. You can use the [X-123] syntax in the PR body or commit messages.

But Jira also has its own GitHub automation that uses a syntax called "Smart Commits" to link PRs and commits to tickets.

https://marketplace.atlassian.com/apps/1219592/github-for-jira
https://confluence.atlassian.com/fisheye/using-smart-commits-298976812.html

Aside: these integration seem compatible with each other but we should probably revisit our own automation and assess whether it's still needed.

A small deviation from our current habit of including the Jira ticket number in the PR title would be to use the Jira ticket number in the semantic/conventional commit/PR title as an optional scope (source:

feat(X-123): example

"Semantic" or "Conventional" Commit Messages?

There's already been a bit of discussion around semantic vs conventional guidelines. Are we leaning towards one or the other?

https://seesparkbox.com/foundry/semantic_commit_messages
https://www.conventionalcommits.org

Adam Butler · Answer 31 · Fri Sep 25 2020 18:02:20 GMT+0800 (China Standard Time)

Resolution

After extensive discussion yesterday (thanks to everyone who took part!), we have decided on the following initial roll-out strategy: from now on everyone should try following the proposals on all repos, but there will be no enforcement and no tooling. We will then reconvene in a month to compare experiences and opinions.

Level of acceptance

3: Majority acceptance, with conflicting feedback

Adam Butler · Answer 32 · Fri Sep 25 2020 18:17:18 GMT+0800 (China Standard Time)

"Semantic" or "Conventional" Commit Messages?

There's already been a bit of discussion around semantic vs conventional guidelines. Are we leaning towards one or the other?

The Conventional Commit Message convention is essentially a superset of the Semantic Commit Message convention. The main extension the Conventional Commits provides is a ! before the colon in the title and/or BREAKING CHANGE: <description> in the footer, both of which indicate a breaking change.

It also gets more specific about what else can be contained in the body and footer, which is all good and recommended, but strikes me as overkill for the kind of work we do at Artsy (i.e. we're not building open source tooling that is used by other people - apart from Hokusai, I guess).

So I would say we lean in to semantic, and towards conventional. Does that sounds reasonable?

Matt Zikherman · Answer 33 · Thu Nov 19 2020 21:39:44 GMT+0800 (China Standard Time)

Following up on this....it's been > 1 month and I'm not sure if we've had a retrospective on this yet.

My own experiences:

Conventional commit/PR naming and titling

I am in favor of continuing this. I've personally grown accustomed to the style, and I believe we have some Peril/linters operating on some repos (perhaps we can expand that to more repos).

Squash/merge/rebase

I am not in favor of continuing to prefer rebasing or squashing. I think people's muscle memory when clicking that beautiful green button, has led to anecdotally more deploy PR's that cannot be safely merged (due to a previous deploy PR being erroneously squashed). The subsequent deploy PR's have merge conflicts that require manual intervention (including a push to release to clean things up and then an immediate cancel of that automatically triggered build in the Circle UI- what happens on the day we do this and the Circle UI is down?). At the very least, I think if we want to keep the convention, we should invest in some tooling to prevent this on deploy PR's.

Angela Patrinou · Answer 34 · Thu Apr 28 2022 23:22:26 GMT+0800 (China Standard Time)

@admbtlr shall we add a resolution section here so that it's easier for people to see what has been decided?
I would add it both as a comment on the bottom of the page and as the final section on the RFC suggestion.
We can follow the template here.
Also is there a way to remove the peril bot comments? They seem to be overcrowding things without adding value.

devon blandin · Answer 35 · Fri Apr 29 2022 20:10:15 GMT+0800 (China Standard Time)

@patrinoua Good point about the noise. I believe we can eliminate the bi-directional linking by removing the link to this RFC from this automated comment in our Peril settings.

Adam Butler · Answer 36 · Fri Apr 29 2022 23:16:16 GMT+0800 (China Standard Time)

Oof, I haven't been back here for a while. That's a lot of mentions! Would you be able to remove the link from Peril @dblandin ?

devon blandin · Answer 37 · Wed May 04 2022 16:58:07 GMT+0800 (China Standard Time)

Oof, I haven't been back here for a while. That's a lot of mentions! Would you be able to remove the link from Peril @dblandin ?

Removed in artsy/peril-settings#194

[RFC] Best Practices for Naming and Merging PRs

Best Practices for Naming and Merging PRs

TL;DR

Proposal

Reasoning

Philosophical Aside: PRs as Git Molecules

Pragmatic Aside: But This Makes git bisect Useless!

Exceptions

Additional context

How is this RFC resolved?

Resolution

Level of acceptance

Resolution

Level of acceptance

Conventional commit/PR naming and titling

Squash/merge/rebase

Pragmatic Aside: But This Makes `git bisect` Useless!