A specification for adding human and machine readable meaning to commit messages

42 points by nikolasavic 3 years ago · 48 comments

Reader

The examples given are more "what" changed than "why" it was changed. These are low-value commit messages because they are redundant with the content of the commit itself.

It is almost like signing all your commits with your name or the current date. (Yes, I had a coworker who did this.)

Better commit messages tell you what the situation was around the commit: Ticket number, or who wanted the change, or any other context that might tell you why the code was changed the way it was.

Consider the dev accessing your commit through "blame". What does that user need to hear? Not which file or subsystem was changed. But the reason there is a change in the first place.

My habit has been to prepare longer commit messages, a paragraph or two of explanatory text for that future developer, who is most likely future me.

Policy at my company now squashes all my carefully-prepared commit messages into the one-liner "Merged $BRANCHNAME into main". I will probably just switch to the three character commit message "WIP" like my coworkers have done.

charrondev 3 years ago

At my company we squash all merges and I’m the one that put the rule into effect. A few things to keep in my with squash merges (at least with GitHub)
- GitHub uses the pull requests title and description as the merge commit description, as well as linking to the pull request. This means all of our mainline commits now have links to relevant Jira tickets, context, change requests, and feedback on the PR. - The mainline commits are new linear and easy to read through, to the point where I was able to make an internal tool that can show everyone where every commit is at, with a nice description, and what stage in the production release cycle it’s at. - git bisect is a lot easier now. No one has to bissect through all kinds of “wip” or “fixed test for x platform in ci” commits. - most devs never have a need to rebase. Ever. This means devs can stack PRs against each other and test each others code together without having to deal with a rebase causing a bunch of pain. - All commits in our main branch now pass CI. - all commits in our main branch are now GPG signed by the org, without devs needing to configure commit signing locally.
- pfix 3 years ago
  
  Maybe it's my DevOps / System Engineering perspective, but most of the time when checking the history of the code, I care why a specific line changed - and squash commits don't help me get the granularity for that (e.g. why do we need 5 instead of 4 instances now) - most of the times, those changes are too small in the context of the full Pull Request / Merge, but matter 2 years later when you try to grok why something is the way it is.
  So I prefer the flow of rebasing before merge. This way main stays linear and readable and you have the descriptive level at the commit level. The Context for the full merge can be found in the ticket corresponding to the change.
  - cbovis 3 years ago
    
    "matter 2 years later when you try to grok why something is the way it is"
    Sounds like the perfect use of an inline comment rather than change history?
    
    pfix 3 years ago
    
    True - but then I've seen way too many comments having no relation left to the code around them.
    On the other hand the worst offenders are refactorings with the commit message "refactoring". Then you can go hunting. Then a comment would have helped.
    In a perfect world we would have both, good inline comment, well written commit message and a ticket with more than a title.
vbezhenar 3 years ago

First line should provide short description of "what" changed. Because nobody's going to inspect every commit changes to find out what changed if all you need is quick glance at commit log.
After first line you can write all the additional information.
pancrufty 3 years ago
I think it's standard for juniors to think comments just need to rephrase code. Commits titles are just comments (about changes)
```
    // if ID exists in commits
    if (commits.has(id)) {
```
Big sigh every time I see this, especially if the code is "well-commented"
```
    // if ID exists in commits
    if (commits.has(id)) {
      // make no changes
      return
    }

    // add ID to commits
    commits.add(id)

    // log ID to console
    console.log(id)
```
On the other hand, LOCs through the roof, 10x developer right here.
jahsome 3 years ago

I've hit a couple shops in a row now where squashes are The Way. It's such a short-sighted and misguided policy.
I don't understand what is so appealing about a linear commit history. It's a fabrication of reality, and I have never been grateful for it, only enraged.
Why wouldn't you want to know what _actually_ happened? What is being gained besides an aesthetically pleasing "commits" tab on GitHub?
- Izkata 3 years ago
  
  As someone who does maintenance instead of new feature development: you're right. Squash merges are totally useless. I always want the real history, which lets me see when during the feature development the bug was actually introduced, and the original commit timestamps, which I can correlate to comments on the case.
  There is never reason to squash. The best of both worlds is to always force a merge commit (disable fast-forward merges) and look at the log formatted whichever way you want (--first-parent shows people that linear history without destroying history).
- vbezhenar 3 years ago
  
  Linear history is the only sane way to have usable history. Merge spaghetti is a good way to ensure that nobody would ever being able to navigate it.
  Squashing large number of commits is questionable practice, though.
  - masklinn 3 years ago
    
    > Linear history is the only sane way to have usable history. Merge spaghetti is a good way to ensure that nobody would ever being able to navigate it.
    1. you can have a linear history by rebasing then fast-forwarding onto the target
    2. but it’s complete nonsense, learn your tools e.g. `git log --first-parent` (and how to merge), merge commits work perfectly fine
    3. and importantly you can rebase then merge, which cleanly packages a set of changes behind a single merge commit without interspersing with other branches, and yet without losing the branche’s details
  - jahsome 3 years ago
    
    I'm sorry, I don't mean to be dismissive, but I just can't read that as anything other than "doing it correctly makes it harder."
    That sentiment is not IMHO a particularly potent argument for _anything_ related to engineering.
    Accuracy, not convenience, should be the goal. If you need to make consuming the data more convenient, that should be the focus.
    To be clear using the word "correctly' is putting a lot more confidence behind my opinion than I ever intended, i.e. I am open to counterpoints, and do not portend to think a one size fits all policy is realistic.
  - alxmng 3 years ago
    
    You can list only merge commits to get a linear log, without destroying history.
    
    bloak 3 years ago
    
    Yes. I think the right way to do it is probably with "git log --first-parent", which will give you a linear history, and the linear history will contain only merge commits provided your project was configured to allow only merge commits. However, if you also allow squash or rebase merging then the linear history from "git log --first-parent" may contain commits that are not merges and will show the details of any PR that was rebase-merged. So, if you're going to allow merge commits at all, perhaps that's an argument for allowing only merge commits.
    There's also "git log --merges" but that would presumably show any merge commits that happen to be present in a branch that is being merged so it wouldn't necessarily be linear.
    If you disallow merge commits on GitHub, does that prevent a merge commit from being introduced as part of a rebase merge? If it doesn't, then presumably the only way to guarantee a linear history on GitHub is to allow only squash merging.
    So, if you don't trust your developers to always do the right thing perhaps you should either only allow merge commits or only allow squash merging?
    
    alxmng 3 years ago
    
    Good points. Ideally, I prefer a workflow where features are branches and merged in, with fast forwarding disabled and rebasing used to keep feature branches updated. History is preserved, a linear history exists as merge commits, and merge commits also map cleanly to PRs.
  - nazgul17 3 years ago
    
    You could do a rebase instead of a merge
- Supermancho 3 years ago
  
  Branch merge commit message is where you want the meaningful message. The individual commits are more or less noise for the vast majority of developers (half of which are below average). A useful system accounts for the most common case and it's not on the individual commit level.
  - Izkata 3 years ago
    
    You do realize you're proposing "do extra work to prevent the history from being usable for the small percent who use it", right? The squashed history is lost, not hidden.
    And those below-average developers probably aren't looking at the history at all anyway.
    
    Supermancho 3 years ago
    
    To be clear, I'm implying that a meaningful branch merge commit message is important. That's the change that matters for project history...which may or may not include a squashed history of the branch. The individual commit messages before that are for the developer(s) to manage as they see fit.
    At any time, a developer might make another branch, then merge branches or squash the whole history of their branch or create a new branch and add the changes as if it was a fresh branch. Meaningful history is lost in those cases as well. Adding micro-managing process might get (more or less) predictable results, but it almost always pushes developers toward anti-patterns ensuring those predictable results are not what was intended.
- andrew_ 3 years ago
  
  GitHub flow is a natural companion to squash merges as a rule.
jrockway 3 years ago

I'm a big anti-fan of this convention, but I'll quibble on some of the points. "what" is a good item for the first line of the commit. Not everyone has the luxury of shipping to prod from master; many people need to maintain backport branches, prepare releases, etc. Therefore, being able to quickly identify the commit you're looking to backport can be important. If one line of text can do that, then that's faster than reading the diff.
This doesn't mean a "why" shouldn't also be required.
Finally, I really dislike the specifics of conventional commits. "feat", "fix", and "docs" are not particularly interesting distinctions. Just put whatever you were going to put in parens and save yourself 6 bytes.
masklinn 3 years ago

> Policy at my company now squashes all my carefully-prepared commit messages into the one-liner "Merged $BRANCHNAME into main".
WTH? That’s the dumbest policy I’ve ever heard of. Who made this up?
On the other hand, a branch name can be 255 characters, time to get the entire thing in there.
larusso 3 years ago

I also preach this to every engineer who works in my team. The why is so important along with the context of one or multiple tickets depending on the project. But I‘m also a strong believer in linear history. When we squash commits from topic branches into the mainline we take the why commit message as the merge/squash commit message. I don‘t like the default here which just lists all messages in a row. My personal style changed here from crafting explicit clean commits on the topic branches which could just be merged to a single commit. The reason is that I actually don‘t care how often a file has changed for a given feature. If the change is too complicated it should be split up anyways. We work with Pull Requests in GitHub which means that the PR message becomes the place to describe the why message which gets picket as the merge commit message. I only saw benefits in my projects with this setup. The mainlines history is easy to understand and consists in best case only of PR squash messages. Change Notes can be generated quite easily and single changes are easier to revert (That really depends on the size of the change, I normally keep an eye on that) K think it is then also important that in a change requests not too many things are done at once. Want to fix a bug, do that. Do not introduce a new feature along with fixes for some random other part of the system. Fix that upfront or in a different PR/commit. So all in all I think long detailed commit messages plus squash commits are working together. You just need to pick when you do the work of writing these super long messages. I do it at the end when making ready to publish it as a PR and no longer 20 times during the development of said feature.
- larusso 3 years ago
  
  I wanted to add that I think writing a longer commit message with a description why a change is needed before opening a change request can also help the author to check one more time if the solution or reasoning is sound. Like a rubber ducky. It really helps to write down the message in a way to explain some other person why the change needs to be made and maybe with some more context why the specific solution was choosen. More often than not this one realized that the solution at hand might not be the best or one suddenly thinks about an easier/simpler solution. Because of this I tend to think about this message early. And it happened that I dropped some requests because I thought about an alternative route because I realized the change would bring to much complication etc. Because if it is hard to explain why a change is needed than something is slightly wrong. That is obviously different case by case and one shouldn't form a dogma around it.
bryanrasmussen 3 years ago

>These are low-value commit messages
I think they are more medium quality commit messages, low value would be significantly less informative than that.
Aside from that while I agree you should have ticket number etc. integration between ticketing system (generally Jira, let's be honest) and your git provider is probably non-existent so it is of less real value in finding what you what to find when looking through that git providers interface.
In fact if your commit message does not say what was done you will have to go into the commit to read the code and figure it out, which obviously is wasteful if you are trying to find the most likely commit in a dozen that caused a problem.
superbaconman 3 years ago

> What does that user need to hear? Not which file or subsystem was changed. But the reason there is a change in the first place.
Doesn't that belong in the code itself? Do we really want the intention of a change to sit under multiple levels of blame?
- masklinn 3 years ago
  
  > Doesn't that belong in the code itself? Do we really want the intention of a change to sit under multiple levels of blame?
  Yes? How would you encode the reasons for a change and for the way the change was implemented in code, dozens of lines of comments and ending up with files which are 90% comments floating in the void long detached from any code they were relevant to (and which may not even exist anymore)?
- ivan_gammel 3 years ago
  
  It’s called traceability of requirements and is important part of post-release QA. At any given moment of time it must be possible to understand the reason for change down to a single line of code.
  - zaroth 3 years ago
    
    That’s lovely and all, but can’t be a commit message, which is primarily about succinct human understanding and setting context.
    A commit can certainly have computer readable metadata I presume!
    
    masklinn 3 years ago
    
    > That’s lovely and all, but can’t be a commit message
    Of course it can.
    > which is primarily about succinct human understanding and setting context.
    You do know there is essentially no limit to how long a commit message is right?
andrew_ 3 years ago

all of that sounds awful, I'm sorry for your experiences.
many repos are successfully using conventional commits in a way that's not low value, clear and concise in message, and useful in blame and walking history.

ivan_gammel 3 years ago

There’s a common convention that starts commit message with ticket numbers. Why tickets are not mentioned in the spec? This context is more important.

Regarding choice of types: why „feature“ is shortened to „feat“? If there’s a type for feature, why another type is „fix“, not a „bug“? Semantically naming should be consistent.

Automatic relationship with semver is questionable. Fix can be a change in architecture that deserves major version. Implementation of non-functional requirements is not a fix, yet it does not introduce new features and thus not a minor version increment. These are just two examples where inferred version is not what it could be. Making possible explicit expression of intent would help, e.g. by adding some tag like [minor]. Example:

   „APP-143:fix:major - migrated from mongodb to postgres“

   „123456:new:patch — added logging of requests“

shoo 3 years ago

I completely agree that linking to additional supporting docs that explain requirements, design, etc is very helpful. Especially for the crew of poor bloody maintenance contractors trying to reverse engineer the system requirements and constraints from the commit history a decade or two into the future, after everyone else originally involved with hauling the system into production has escaped/retired/died/fled changing their names and CVs.
I've had some coworkers argue that the first line of the commit ("subject line" in git) is very important real estate, not to be wasted on a ticket reference, where it could instead hold a human readable summary. There's some merit to that. But they'd still include a reference to the ticket inside the body of the commit.
Depending on how excitable one's org is about creating and migrating between issue trackers, sometimes a ticket reference can still be very ambiguous. I've seen one enterprise project migrate between different JIRA instances within the space of a couple of years, where depending on which instance you plugged the same ticket reference into, you'd get a completely different ticket!
- regularfry 3 years ago
  
  Jira integration is one place I've seen some real horrors - feature gets marked as "done" with a link to a git hash, then that patch gets bundled into a massive squash merge with a "changelog" that just says " merge 1/8/2022" and the feature branch deleted so the git hash gets lost. No traceability whatsoever.
diarrhea 3 years ago

You would have a painful time if large but innocuous architectural changes introduced major version bumps. Semantic versioning is how many packaging ecosystems nowadays work and decide compatibility. There is only one question to answer: Did the architecture change change the public API? No: even if this was an entire application rewrite, users don’t care. This is not a new major. Perhaps not even a minor (this is subjective I guess). Yes: it’s a breaking change and warrants a major version bump. So the scale of changes is sometimes at odds with the scale of the resulting version bump. Huge changes might not even be fixes, single-line changes might result in major version bumps. All this is highly important for libraries to adhere to. Binaries/programs possibly less so.
regularfry 3 years ago

> Implementation of non-functional requirements is not a fix, yet it does not introduce new features and thus not a minor version increment.
Bumping the minor version for this is fine in semver. It's a MAY in the spec.

dang 3 years ago

Conventional Commits - https://news.ycombinator.com/item?id=30950377 - April 2022 (1 comment)

Conventional Commits - https://news.ycombinator.com/item?id=24208815 - Aug 2020 (23 comments)

Conventional Commits: A specification for structured commit messages - https://news.ycombinator.com/item?id=21125669 - Oct 2019 (95 comments)

nikolasavicOP 3 years ago

I thought this was interesting, anyone using this? Is the juice worth the squeeze?

Full title: Conventional Commits A specification for adding human and machine readable meaning to commit messages

shagie 3 years ago

I do. I've got a plugin to help with remembering to do it and formatting - https://plugins.jetbrains.com/plugin/13389-conventional-comm...
The thing that it really helps doing (when you're using it) is avoiding doing multiple things in one commit. Features and refactors and fixes belong in different commits.
With this I can also look at my git log and quickly see on the places where I changed things (rather than style or refactor or docs or tests). This commit, with a few lines did this - not "this change was part of this much bigger commit."
- meling 3 years ago
  
  Thanks for this tip about the plugin; I went looking and there is also an extension for vscode: https://marketplace.visualstudio.com/items?itemName=vivaxy.v...
  Haven't tried it but I will.
curun1r 3 years ago

I’ve found it’s too error prone to rely on developers remembering to use conventional commits. But when you use something like cocogitto [0], it makes writing compliant commit messages the path of least resistance. I’ve always liked the idea of conventional commits, but it never felt valuable in practice until I discovered the tooling to make it easy.
[0] https://github.com/cocogitto/cocogitto
morgante 3 years ago

I used it extensively on https://github.com/terraform-google-modules. Maintaining up-to-date release notes on 50+ repos would be extremely time-consuming and error-prone without conventional commits.
manyxcxi 3 years ago

I follow it pretty closely. I don’t have any automation setup for changelogs or anything at this point, but it’s pretty easy for me, as back in ye old SVN/Trac days there were similar FIX, etc. semantics.
I often have trouble with enough room to have a meaningful subject but the time I include commit scope and Jira ticket number, but I don’t mind, I normally use the body anyway.
andrew_ 3 years ago

many popular javascript open source projects leverage this; webpack, vite, etc. It's very compatible with semver.
all of my personal repos use it and any professional repos I have a say in use it.

d--b 3 years ago

Giant sigh…

The reason why commit messages are free form is so they can remain free form.

It’s hard enough to make a model of the world in code. Why in hell would you want to impose this on commit messages?

kkoncevicius 3 years ago

Interesting, I am using a similar convention, but for GitHub issue labels, not commit messages [1]. Then, the commit messages often just refer to the issue number as a reference.

[1]: http://karolis.koncevicius.lt/posts/improving_github_issue_l...

nicolaslem 3 years ago

> the commit messages often just refer to the issue number as a reference.
Please do not skip writing a meaningful commit message (explaining the why) because an issue number is referenced.
At a previous job all commits referenced issue numbers of a dead issue tracker no one had access to anymore, rendering git blame useless.

rektide 3 years ago

A coworker started a personal project & has- at least after a day or so of wild hacking- nice clean commits pretty close to this. I havent yet asked or found out what they're doing but looks great.

Recently I'be started using "subsystem: change" type. Knowing the area seems like the most important starting queue.

Settings

A specification for adding human and machine readable meaning to commit messages

Keyboard Shortcuts