My favourite Git commit (2019)

229 points by neo2006 6 years ago · 69 comments

Reader

The author spent all the time documenting his journey but totally fails to explain the actual reason of the problem: If the file contains a UTF-8 byte sequence, why is there an attempt to parse it as US-ASCII? I get that it's just a wrong space, but if some day somebody actually want to put a UTF-8 char there for some reason, this commit essay wouldn't help at all whereas googling stackoverflow seems to deliver (unverified): https://stackoverflow.com/questions/15947425/rake-tasks-fail...

bassdigit 6 years ago

Following this thought, it could be argued that the author did not even fix the problem, only eliminated a symptom of the underlying, serious configuration mistake that restricts file contents to an outdated character encoding.
- mar77i 6 years ago
  
  I like your approach. And the blogpost's author does not even hint that the story told could mask an underlying issue at all. Which I find mind boggling somehow, but can we call this symptom whack-a-mole and what can be interpreted as anxiety-driven reasoning a cultural issue? Or is there a different, broader context that can wrap this up better that I'm not seeing?
  - bassdigit 6 years ago
    
    The commit message succeeds in justifying the effort but lacks an in-depth understanding of the related mechanics. I can imagine the developer felt bad about 'only' committing a one character change at the end of the day and wrote the message to make up for it. Because of this un-economic outcome, he hesitated to dig deeper.
    There is no reason to feel bad.
    The cultural issue is bad management that still evaluates developers by lines of code / commit activity.
bassdigit 6 years ago

I'm not a ruby developer but the actual problem seems to be the default file encoding.
There is a detailed blog post[1] about it, adding a lot more to understanding what's happening. More than a blog post about how a diary entry as a commit message made somebody feel.
[1]: http://graysoftinc.com/character-encodings/ruby-19s-three-de...

jspash 6 years ago

I'm just curious but... did this actually fix anything other than the spec?

The reason I ask is that I find myself (with Rails particularly) spending an awful lot of time on maintaining tests that were written by a much younger and much less experienced developer (me, a few years ago).

And I've come to the slow realisation that a lot of what I was testing wasn't really important.

I'm not saying the author shouldn't have written the spec, or that the spec wasn't valuable. But maybe the true bug here was that rspec matchers don't honour utf-8 spaces and perhaps they should? I dunno. Just thinking out loud here.

hinkley 6 years ago

Getting people to write tests is hard. Getting them to write matchers... it’s like you’re asking them to fly. Which is a shame because you can avoid so, so much test boilerplate by writing a high quality custom matcher.
- jfkebwjsbx 6 years ago
  
  What do you mean by matcher?
  - allover 6 years ago
    
    > "matchers" [...] let you test values in different ways
    Example docs for Jest's matchers: https://jestjs.io/docs/en/using-matchers
    Generally speaking, if you use a specific matcher, you get a better (more informative) error message when the test fails.
    
    jfkebwjsbx 6 years ago
    
    Ah, the common testing functions like `assert_equal`. Thanks!
    
    hinkley 6 years ago
    
    Right, and some APIs let you write your own. Getting a custom one to hand out actionable error messages is a bit tricky but quite worth it.
- hashtagMERKY 6 years ago
  
  Not a software engineer, sorry. What’s a matcher? Google didn’t help.
  - hinkley 6 years ago
    
    Not to worry. Some test frameworks do not expose them, so people often don’t even to know to want for them. User allover did a respectable job of explaining.
    
    antsar 6 years ago
    
    IME (python webdev) most test frameworks do expose some form of this, but I’ve never heard them called “matchers”. So perhaps people know & use them, but not by that name.
    
    hinkley 6 years ago
    
    I mean expose the ability to write your own. When I first used them, I thought all frameworks allowed this, then bumped into some that didn't, or at least not in the popular version at the time.

pvorb 6 years ago

I, too, use git commit messages to document the reasoning behind my changes. But I'm unsure if my colleagues even notice, since they regularly write bad commit messages like "Fix bug" and things like that. Also I regularly find my commit messages got lost, since the reviewer squashed several commits into one.

Do you have any tips for establishing a culture of good commit messages?

optimuspaul 6 years ago

I usually start with being aggressive about people not squashing commits. I think it's bad form. I also find shame works, as well as blocking merges when reviewing when commit messages provide no value. Could also start a swear jar of sorts, the offender puts a dollar in whenever they push a commit with a useless message... and use the proceeds for charity rather than something that benefits the team.
- l0b0 6 years ago
  
  This requires an existing buy-in from the team. How do you establish that buy-in in the first place? I know trying to be a good citizen isn't enough, because the worst offenders simply aren't interested in the revision history.
  - optimuspaul 6 years ago
    
    Great question I don't have a solid answer to. Due to my seniority I tend to either be the lead or act as the lead for teams. It is easier for me to reason with the team to buy into these kind of things. The worst offenders will tend to be weeded out over time either by changing their ways or leaving. I don't have much sympathy for them if they can't see how their actions affect others. A team has no place for a rogue agent.
dionidium 6 years ago

> Also I regularly find my commit messages got lost, since the reviewer squashed several commits into one.
This tendency toward nice, neat, clean, architecturally pure, and ultimately worse solutions runs deep in software engineering (and in humanity, in general (see e.g. the failure of pristine housing projects that replaced messy-but-functional slums)) and it's one I have to fight within myself, too.
I don't think there's any great way to combat it, though. You need mature, experienced developers on your team who aren't too tired and burned out to fight it.
jamietanna 6 years ago

Within my team, we use Git flow with 2 person enforced code review before a merge (within a 10 person team). We use the code review to enforce this, alongside any other changes required.
I'm a strong proponent of good commit messages, so one thing I've been doing since joining the team is encouraging this to be better.

judge2020 6 years ago

Previously: https://news.ycombinator.com/item?id=21289827 (domain has changed since the last post)

saagarjha 6 years ago

I knew I had seen it earlier and was wondering why I couldn't find it…

ahh 6 years ago

I do love the social history of good commit messages, and what they teach us about the process of what we do.

I will offer one nit here, which is that I always start my commit messages with one-line summaries; this makes it really easy to scan commits (and git is tooled to do this!)

sixstringtheory 6 years ago

I don't understand your nit: is it that you don't see a one-line summary (which exists: "Convert template to US-ASCII to fix error") or that you don't feel it correctly summarizes the change?
- BoorishBears 6 years ago
  
  It doesn’t summarize the intention of the changes, but it does summarize a bunch of stuff already in front of you when you’re looking at a diff...
  -
  Remove non-ascii character to fix error from `bundle exec rake`
  -
  That commit message would add external context (what was broken/where the error was coming from) and intent
more-coffee 6 years ago

That's a good one. I do the same in code documentation. One line to very briefly summarize what that function/module does, <enter><enter>, followed by a more detailed description.

rachelbythebay 6 years ago

I wonder: what is the bad character? It's apparently not 0x20, but what is it? How did it get there? What keeps this from happening again to someone else?

bassdigit 6 years ago

It is U+00A0 [1], a non-breaking space [2], can be typed in some office applications using ⌃+⇧+␣.
[1]: http://fileformat.info/info/unicode/char/00a0/
[2]: https://en.wikipedia.org/wiki/Non-breaking_space
- rachelbythebay 6 years ago
  
  Ah, my old nemesis, the shifted space. Haven't been bitten by one of those in a while. Thanks for digging that up!
  - bassdigit 6 years ago
    
    No-break spaces aka   can be quite useful for preventing line breaks for example between a value and its unit, or between label and value. Or when writing for example 'e.g.', although you could argue that in this case a NARROW non-breaking space U+202F would be preferable.
robert-boehnke 6 years ago

If I had to guess, the no break space that macOS has on [alt]-[space]: " ".
EDIT: Hmm, that doesn't seem to be it, here's the commit in question https://github.com/alphagov/govuk-puppet/commit/63b36f93bf75...
- bassdigit 6 years ago
  
  It IS the no-break space. The hex UTF-8 representation of it is C2A0; you can find that sequence in original version of the file.

c3534l 6 years ago

What's the latest on how long a commit message should be? Used to be everyone said a message should never be longer than two sentences and that they should be as short as possible so that people actually read them. This thing posted is an essay. Not the haikus I thought was considered best practice.

chucksmash 6 years ago

I've always just gone with
52 ~line~ [edit: character] summary
Then write a book, just being sure to wrap at 72 chars. Never really considered what an optimal commit message length was.
Maybe it's something along the lines of "if you are playing poker and you can't figure out who the easy money is, the easy money is you." I've never seen a commit message, PR description, etc and thought "well that was excessive," so maybe I'm the one...
This one stood out to me: 157 lines of rationale, discussion of alternatives, etc for 22 lines of uncontroversial changes[1]. It's much more likely to be useful in and of itself as a piece of documentation than a one-liner, "Changes to prevent strcpy" though.
[1]: https://github.com/git/git/commit/c8af66ab8ad7cd78557f0f9f5e...
- temac 6 years ago
  
  While I'm perfectly fine with this commit-essay, and would probably be with any other, in this particular case I would also be perfectly fine with a terse "ban strcpy" + actually a few more comments in the source code.
  Commit messages are not going to serve as a knowledge base for state of the art and best practices, and strcpy is universally recognized as problematic. Plus, about the how, while it is fine to see the dev thought a lot about it, did some testing, etc., the way he did the ban is ultra classic in the end. Also, it might have been even better to put some short explanations in comments instead of in just the commit message, in particular the usefulness of the BANNED macro to preserve some line numbers with gcc. So in this particular case I'm not really convinced that the commit message serves any strong purpose.
  But it does not hurt. And it participates to a culture where useful commit messages are important. So I still (weakly) prefer it to my hypothetical "ban strcpy".
- saagarjha 6 years ago
  #undef strcpy
  Yikes…
jfkebwjsbx 6 years ago

That's definitely not common practice.
Commit messages need to be as long as needed, period.
progval 6 years ago

> This thing posted is an essay.
Somewhat related, the largest commit message I know is 857d286123acf87ae4a08528a3eef4ce2fbf8db2 from this repo: https://github.com/nanobox-io/nanobox-pkgsrc-lite (loading it crashes github)
It's 100MB big, and contains... an entire git history by itself.

mroche 6 years ago

Linking to the mpv commit[0] message brought up in the last discussion[1]. It’s absolutely hilarious, and a great example of someone rage typing after dealing with an issue.

[0] https://github.com/mpv-player/mpv/commit/1e70e82baa9193f6f02...

[1] https://news.ycombinator.com/item?id=21290517

s3graham 6 years ago

My personal favourite commit (as opposed to commit message) https://chromium.googlesource.com/crashpad/crashpad/+/badfac...

(Sadly, the commit message isn't great, but clearly it's worth that hash.)

sytelus 6 years ago

Please do not do this. Long essay style commit messages are not really searchable, have no real way of discussions, do not support formatting, images etc and are hard to even read in many tools. A better way is to create an issue first and attach it to your commit.

kimusan 6 years ago

This is my favorite git commit: https://android-review.googlesource.com/c/platform/system/bt...

result:

https://android.googlesource.com/platform/system/bt/+/refs/h...

dirtydroog 6 years ago

That's just way too verbose though. There should have been a ticket with a description of the problem. Name the branch after the ticket and jira will automatically link the ticket to the commits that fixed it.

I've encountered bad utf-8 when data is copied from excel or outlook.

gwright 6 years ago

I prefer the engineering details to be in the commit message and not an external issue tracking system. That system might not be accessible 3 or 5 or 10 years later.
- dirtydroog 6 years ago
  
  That is true, and it might not work for Open Source development, but I don't see managers trawling through git commits to see what was done / is in progress.
  - couchand 6 years ago
    
    Then find better managers.

jariel 6 years ago

Can someone comment on a best practice whereby the commit might merely refer to the problem ticket (in whatever system), which should have all of the relevant problem histories, and the articulation of the solution? As opposed to all of this documentation in git?

saagarjha 6 years ago

If you company has a bug tracker like this internally and you maintain the project "in the open", please don't do this. I'm looking at you, Google and Apple…

Jefro118 6 years ago

Does anyone here try to enforce good commit message practices within their engineering teams?

juped 6 years ago

All that text and it doesn't mention that the offending character is a U+00A0 NO-BREAK SPACE with UTF-8 bytes C2 A0.

codegeek 6 years ago

Tl;Dr: Sometimes, Why is more important than What.

cborenstein 6 years ago

+1 that Why matters most.
My teammate recently wrote a blog post on how we document Why in git commits and PR descriptions.
https://medium.com/better-programming/daily-habits-to-turn-y...
Would also recommend checking out a related talk by a tech lead at StitchFix: https://confreaks.tv/videos/rubyconf2018-documentation-trade...
pvorb 6 years ago

"What" is also equally important. But it's covered in the diff itself. Often, there's no need to repeat the diff in text form.

neo2006OP 6 years ago

It's not only about the fact that the commit message for a one char change is about a page long. It's about that all the information in that page long commit message is a useful information

BoorishBears 6 years ago

Strongly disagree.
There is so much superfluous cruft in the message.
Listing the error, and the fix, would give exactly as much context for the next person to run into this issue.
If your goal to explain the process to find similar cases, just a couple more lines would have done just as well:
-
Fixes case where `bundle exec rake` would fail with error message:
ArgumentError: invalid byte sequence in US-ASCII
This is caused by the presence of non-ASCII characters in files.
The following command was used to find files with non-ASCII characters:
`find modules -type f -exec file --mime {} \+ | grep utf`
Running `iconv -f UTF8 -t US-ASCII` on listed files will show the exact location of offending characters.
-
No long winded exposition, no extremely case specific program output (including current machine's name...)
I’m not saying this is the worst commit message ever, and I’d appreciate the intention if I came across it, but if you’re caring enough to give this much context, you can save both yourself, and the next person to look for your message, some cognitive load by sticking to what’s needed.
-
And if you’re wondering how to decide “what is needed”, it varies, but I think a good rule of thumb is: ask yourself how one would find this commit message
What would someone grep for in git's history if they came across this? And if they found it, what information answers their query?
They’re likely to search the command that is failing, and the error.
The answer to that query is, what causes the command to error out, and how to find cases of that cause.
They’re not going to search for the program output of find, or iconv, not going to search for the exact file you were modifying when this happened, or the fixes you tried that didn’t work.
- skrebbel 6 years ago
  
  I disagree.
  I mean, you're right, but I don't think that's time well spent. This is a commit message, not a blog post. Writing concisely is hard and time consuming.
  This was probably written in stream-of-consciousness fashion which strikes a nice balance between time invested and possible future usefulness (if any). I mean, the commit might be useful in the future, but there's also a reasonable chance that nobody except the reviewer of the PR is ever going to read that commit message. (If the OP hadn't blogged about it)
  - BoorishBears 6 years ago
    
    How is writing concisely more time consuming than this long winded stream of consciousness with multiple quoted command outputs and explanations of non-fixes?
    If I come across an error and it takes two commands to fix it, my immediate intuition is a commit message with... the error and the two commands. And that’s extremely quick to write...
    It’s definitely not “let me write a detailed account of the last hour of my life”... that immediately feels like it will take much longer
    Also to be clear, the rule of thumb I mention is not about “what should you think about, then write down”
    It’s a filter on what you were already about to invest effort in writing down.
    If you feel like an even shorter message is good enough, that’s great.
    Just realize sometimes less is more...
    
    JoshTriplett 6 years ago
    
    It's the exact opposite of "If it was hard to write, it should be hard to understand". This is "This was hard to debug, so let me make it easy to understand and give you (or future me) the hard-won information I just discovered so that you don't have to go through what I did".
    If the information took you a great deal of effort to get, that suggests it may be valuable to share. That may not be the right time to attempt to cut out some of that information for brevity, especially if you don't know what the future reader of the message might need to know.
    
    BoorishBears 6 years ago
    
    What information in the concise version I gave is missing, that causes you to go through additional effort?
    I think you edited this in after my reply:
    > If the information took you a great deal of effort to get, that suggests it may be valuable to share.
    I think this is the core of where we disagree. I often spend insane amounts of time debugging something, only to find the solution was quite simple and unrelated to what I tried.
    Often the amount of time you spent debugging is because the search space for your solution was unbounded, not because each step you took was particularly relevant to the actual solution.
    To me a commit message should be able hopefully narrowing that search space, not documenting the original unbounded one.
    
    JoshTriplett 6 years ago
    
    I'm not suggesting that the version you gave would be a problem in practice; I'm also not suggesting that the commit message in the article is the best possible commit message for that change. I'm suggesting that after a long debugging adventure, I'd prefer to err on the side of including any information that feels hard-won, rather than trying to make the message as concise as possible at the possible expense of missing an important detail.
    I've read many commit messages with too little detail, and zero with too much detail. (I have read code with too many comments, but it's rare; I've never once read a commit message that made me think "too much detail".) I'm sure it's possible, but on balance I prefer to cultivate instincts of "document anything that was hard to learn" and "in the moment, prefer to over-document rather than under-document".
    
    BoorishBears 6 years ago
    
    Isn’t this exactly what I said in my original comment...
    I appreciate the intention and wouldn’t complain, it’s just if you care enough to write so much, save yourself sometime and the next person to read it some effort by being a little more concise.
    Commit messages are like naming things, there’s no right way, but the more you try to do it “right” the more “right” you get.
    If you just throw your hands up and say “I’m goin to vomit our every thought I had working on this”, sure it won’t kill anyone and it’s preferable to always writing nothing (that’s kind of obvious...), but you’ll never get to the point where writing even somewhat balanced messages is comfortable
  - hipnoizz 6 years ago
    
    "I didn't have time to write a short letter, so I wrote a long one instead."
    I definitely agree with BoorishBears - too much unnecessary details in this commit message. Of course it is much more preferable than "Fix template", "Fix utf8" or "fix" which is all too common. But if someone spent 1 hour fixing a problem and then probably 5-10 minutes working on the message then adding 5 minutes to make the message concise should not be a big deal, right? (Who am I kidding ;-))
    We use Gerrit at my company, so pushing the change is just a first step. I routinely read my commits after they are pushed for review and many times there are one or more additional changesets because I thought I found something worthy to add to the commit message.
- neo2006OP 6 years ago
  
  I see your point, but as a person that get bored easily when I read, I like how it give context and narrate a story instead of writing only the technical facts. Even if it's lengthy, it's a commit message I will finish reading.

bassdigit 6 years ago

Paraphrasing the first comment of the commit[1]:

A commit message with such a long explanation helps nobody, because it is not searchable by Google, not even by GitHub[2]. [...] If the commit message requires a blog post to be good (searchable) – it is a poor commit message.

[1]: https://github.com/alphagov/govuk-puppet/commit/63b36f93bf75...

[2]: https://github.com/search?q=invalid+byte+sequence+in+US-ASCI...

emj 6 years ago

I must say our private git repos are a treasure trove of information, not using git for help is immensely stupid. In the same say it's sad that Jira and all do not help you find commit messages in a better way.
bassdigit 6 years ago

Not sure how to feel about being author of the most upvoted and also the least upvoted comment.

Settings

My favourite Git commit (2019)

Keyboard Shortcuts