Git techniques

86 points by jamescun 4 years ago · 77 comments

Reader

I feel that having a default merge strategy to squash and merge all commits in a branch is a version control anti-pattern. This discourages thoughtful and frequent commits that express the intent of a change because all the commits are just smashed together anyway so why bother. I think context and intent is lost when looking through git history of large smashed commits.

I prefer using a precommit hook to automatically prepend a Jira ticket number to each commit so when you look at the history you'll see multiple commits grouped together with the same ticket prefix, but the commits still retain the intention of the commit. Knowing that commits will not be squashed promotes devs to make meaningful commits. I still advocate for cleaning up and squashing your own commits as you see fit with an interactive rebase before your branch is merged. Having discrete commits can also help when running git bisect to find when a bug was introduced so you identify the specific commit instead of a feature being merged.

lysp 4 years ago

That's why I prefer feature branches with merge commits.
Your dev branch is clean because each merge-commit is a single commit per task.
So you can see which tasks were merged and in which order and what file as a whole were changed in each task.
If for debugging / code review or any other reason need to look at specifics, you can look through the feature branch commit by commit to see what was changed and why.
It's best of both worlds.
Similarly merging dev into master/main. You get a release by release view of what files were changed in a single merge commit.
- rvdginste 4 years ago
  
  Completely agree with parent and grand-parent. For me, a nice clean commit history is an investment in future maintenance.
  Following the rules described above means that you get a lot of context on why code was changed:
  * the invidivual commit which should contain a description of the change if the change is not self explanatory or not 'intuitive', the individual commit should consist of only 1 functional change
  * the commits surrounding it, the feature branch (clearly visible because of the merge commits)
  * the issue number on the commit itself and possibly in the merge commit
  I've found all this context very informative for projects that are in maintenance mode and still need changes from time to time. Obviously, the higher the quality of the commit history, the higher the quality of the information you get out of it.
  Meaning: if you put a rename with an impact over the whole code base (because the original name just happened to bother you that day) together with a bugfix in the same commit and the commit message has the very informative text 'fix', and the referenced issue mentions 'add support for blah' (but the commit obviously does not implement anything related to 'blah'), then... well, yeah, then how you organise your commit history does not really matter.
  - lysp 4 years ago
    
    * the issue number on the commit itself and possibly in the merge commit
    Absolutely this too!
    My process is to create a feature branch named "ISSUE-123-issue-description"
    The benefit of this is all changes are tracked (and tested) against a specific issue in bug management software.
    It also prevents people making small / unrelated changes or fixes in association with another task. If these are grouped in together in a single and unrelated task they won't be trackable or testable.
- infogulch 4 years ago
  
  The merge commit always refers back to the PR id, so even if it was squashed you can still look up the play-by-play by looking up the PR. I wonder if this could be represented in git by tagging the squashed branch with the PR id with a description that has the merge commit id, and the merge commit description that references the tag. It would be nice if something like this could be standardized so it works across systems.
globular-toast 4 years ago

Something people don't get about commits is they have multiple purposes. A lot of this is due to still entrenched assumptions and practices from older, inferior version control systems.
There are at least two types of commit in git: a savepoint and a version.
A savepoint is what happens during development on a branch. Git makes it super easy to make many, many savepoints throughout the day. These help you as a developer because it gives you something to fall back on if you make a mistake. But most of them should never be exposed to anyone not directly working on the branch.
A version is what you share with others. A version is a fully working version of the software that can be reasonably checked out and put through a release process at any time. Usually a version will be unit tested but not subject to the same rigorous tests as a release.
There is a direct analogy here with database transactions. Just replace version with transaction.
Often while working you will find it's possible to write the version commit right away. This is usually for more trivial fixes or in some cases when a commit is required for something like a database migration (when things need to be deployed in stages). Other times you will need to make several savepoints before you get to a new version. This is what rebase is for. Many of those savepoints don't belong on the master branch as they are often fixing stuff you haven't even committed to master yet.
Git has a few tools to help you defer rebasing until later. In particular you can make fixup and squash commits. These will be normal savepoint commits, but they will be labelled in a way that later you can issue an "autosquash" command to automatically rebase these into version commits.
- WorldMaker 4 years ago
  
  There's also nothing wrong with leaving "savepoints" type commits in a branch. Sometimes "I stopped here and took a break" is still useful information to have later on.
  Git provides a DAG and you can use a --no-ff merge to build your "version commit" from the sub history of its "savepoints". You can follow one parent of the merge to the next "version commit" or you can follow the other parent through the intermediate "savepoints" that built it step by step.
  You can use --first-parent today for most git operations to get "clean views" no matter how complex the DAG web is beyond it. I think a lot of these debates would "go away" if more people and user interfaces defaulted to --first-parent and "drill down" navigation rather than firehose of the complete graph and confusing (but pretty) "subway diagrams".
  - globular-toast 4 years ago
    
    > There's also nothing wrong with leaving "savepoints" type commits in a branch. Sometimes "I stopped here and took a break" is still useful information to have later on.
    I don't think they should be on an eternal branch. I seriously doubt it is ever useful to know that some developer took a break at 11:21 three months ago. This is just noise and makes bisecting impossible, which is the entire point of keeping any history.
    > Git provides a DAG and you can use a --no-ff merge to build your "version commit" from the sub history of its "savepoints". You can follow one parent of the merge to the next "version commit" or you can follow the other parent through the intermediate "savepoints" that built it step by step.
    You could, but it's thoroughly nonstandard and requires knowledge and careful use of tools to filter out all the noise. Be nice and filter out the noise in a rebase.
    > You can use --first-parent today for most git operations to get "clean views" no matter how complex the DAG web is beyond it. I think a lot of these debates would "go away" if more people and user interfaces defaulted to --first-parent and "drill down" navigation rather than firehose of the complete graph and confusing (but pretty) "subway diagrams".
    Maybe, but this goes down the route of a blessed workflow. It would require all the tooling to agree on the workflow as important information could easily (even maliciously) be hidden if the workflow wasn't followed. It reminds me of the joke about emacs: an OS that lacks a good editor. Git is the SCM that lacks a good version control system.
    
    WorldMaker 4 years ago
    
    > I don't think they should be on an eternal branch. I seriously doubt it is ever useful to know that some developer took a break at 11:21 three months ago. This is just noise and makes bisecting impossible, which is the entire point of keeping any history.
    As someone who has had to do deep code archeology, "this was finished before coffee kicked in and is suspect" or "this was written before lunch and the coder may have been hangry" can be really interesting information to have.
    git bisect supports --first-parent and bisecting a noisy history is not just possible, but often faster with --first-parent. When you find the merge commit that introduced the regression you branch that commit and run git bisect --first-parent in that branch for additional drilldown into which "sub-commit" of the merge introduced the problem. (And you can do that into additional layers if you've got deep merge commits.)
    > You could, but it's thoroughly nonstandard and requires knowledge and careful use of tools to filter out all the noise.
    It doesn't require that much "care" to use --first-parent as a default in your git commands. You can even set it as a default in you git config for relevant commands (like git log, git praise, git bisect), or just add simple aliases for them. Pretty standard, and not that much knowledge, and you can pass it around with a couple quick git config commands. Assuming of course remembering the pretty much only one option --first-parent is too hard.
    Also, I don't know what "care" has to do with it: forget to do it and you see a lot of "noise". Noise isn't dangerous. Annoying maybe, but it's definitely not dangerous to see extra noise when you wanted a cleaner view.
    Outside of the command line, sure there aren't a lot of great UI tools that take a --first-parent centric approach to git. But it doesn't need all of them to "agree" (because again, if a tool shows you too much noise, that's not dangerous, that's just annoying), just one good --first-parent based drilldown UI would do a lot to make people more comfortable with thinking about the git log in two dimensions instead of trying so much to squeeze git into the one dimension of CVS or SVN. I think it's mostly a matter of aesthetics and what "sells": the subway diagrams of the DAG look pretty in screenshots but rarely are a great user experience in practice. (So much so that everyone keeps wanting to smash git into a single dimension of code history because they find it too "noisy".) Rather than "declutter" with rebases, a --first-parent / drill-down-oriented UX would do wonders for the git ecosystem, especially for Junior Developers uncomfortable at the command line, that likely shouldn't be trusted with rebases, and would have a much better time all around if told them "don't sweat your individual commits, they'll roll up into a cleaner merge commit at PR time".
    
    globular-toast 4 years ago
    
    Care has to be taken that the trunk branch actually is the first parent in each merge commit. Maybe it's unlikely to go wrong in practice, but it's certainly possible to have a merge where the parents are the "wrong way around" thus messing up your first parent strategy.
    Also, "git praise"? Is that really a thing now? Talk about not understanding programmer humour.
    
    WorldMaker 4 years ago
    
    In most cases where people are using a PR system as the primary integration point, I've not seen a single PR system that has a problem with sometimes getting the parents in merges backwards. The only time I've seen that is junior developers making merges they shouldn't have been (and are the same developers I would never trust with rebase, even just in their own branches) and there is a way to rebase merge parents if you really want to pull out your rebase fu for something.
    > Also, "git praise"? Is that really a thing now? Talk about not understanding programmer humour.
    git praise has been a standard git alias for git blame for several years now. I'd prefer if git had followed most other VCSes and named it git annotate rather than making it a micro-aggression out of the box, but yeah one person's micro-aggression that makes a papercut in daily workflows is another person's punch down "humour", I guess. I'm glad you seem to enjoy it, I don't appreciate it.
jonkoops 4 years ago

I have to disagree with this as it relies on the assumption that every commit on a branch is logical and descriptive. In my experience a lot of PRs will have small commits that have poor names as they go through a review process. If you merge this using a regular merge commit or by rebasing the commits on the target branch this creates a lot of noise for those who look at the commit history.
In my opinion it is best to squash all commits into one before rebasing it on top of the target branch. During this process any information that is considered important for the history can be preserved by leaving it in the commit body.
- omegalulw 4 years ago
  
  > I have to disagree with this as it relies on the assumption that every commit on a branch is logical and descriptive. In my experience a lot of PRs will have small commits that have poor names as they go through a review process.
  There's your problem. Code reviews should not allow such commits to pass through.
  - t3h2mas 4 years ago
    
    > Code reviews should not allow such commits to pass through.
    Are you suggesting that your code review process has a stage for combing through commit messages? What does this look like? If the third commit message of 15 isn't up to par what happens?
    
    derekperkins 4 years ago
    
    You ask them to fix the commit message. Every git GUI should support that by now, so it should be a 1 minute fix, even for junior devs.
- rzwitserloot 4 years ago
  
  > I have to disagree with this as it relies
  No, you didn't read the comment fully, or you only disagree with part of it. Because, you clearly missed this part:
  > I still advocate for cleaning up and squashing your own commits as you see fit with an interactive rebase before your branch is merged.
  If you do that, you don't end up with 'small, poorly named commits'. Or if you do, you have a lazy programmer / an idiot programmer in the team.
  Which certainly happens, but, they ruin everything. You can't start shooting down processes, languages, tools, or anything else in the programmer space __just__ because some moron who abuses it ends up in a bad place. You need to show that a tool / feature / process / hook / etc turns otherwise fine, capable programmers into idiots in order to advocate for its abolishment. Not the other way around, or you end up with a blunt rock and a club and are then debating that they're holding the club at the wrong end.
- rectang 4 years ago
  
  As someone who carefully crafts my git history, I hate it when somebody smashes my work.
daitangio 4 years ago

I agree. I do not use squash, I prefer to have a feature branch and live it alone after a merge.
Anyway some workmate use Squash when accepting pull request on GitLab/GitHub as a general workflow suggested by such tools and in context where trunk based development is not feasible.
dahart 4 years ago

> I still advocate for cleaning up and squashing your own commit
Completely agree. And I suspect the increasing frequency of squash-merging is mainly to avoid having to do the work of cleaning up and commenting individual commits in a longer sequence.
I can see this both ways, it really is faster and easier to squash. And you’re right, it really does bury some context and functionally makes large changes harder to read or bisect or revert or modify.
One benefit to squash merging that you might have overlooked is that it can encourage frequent (and messy) committing, knowing that the churn will disappear without having to work hard to clean it up. This does, in a way, make the git workflow more appealing and easier to manage for more people.
- mrinterweb 4 years ago
  
  > One benefit to squash merging that you might have overlooked is that it can encourage frequent (and messy) committing, knowing that the churn will disappear without having to work hard to clean it up.
  I've noticed the opposite. Developers who know all of their work will be smashed into one commit at the end tend to not commit as frequently, and the commits they do make are just checking in all of their work at intervals. It is more of a process of saving state. It doesn't matter how frequently they commit if all the commits will become one.
maximilianroos 4 years ago

IME this depends on whether people make large or small PRs in a repo.
If people make small PRs, committing to mainline as they go, then squashing each PR fits well.
halestock 4 years ago

I've seen this argument come up a few times, and the best suggestion I've heard which could make both camps happy is to add the notion of commit groups. You could view a pr in history as a single commit group, or see each individual commit for the full context.
- hnlmorg 4 years ago
  
  Is commit groups a feature or a wish list? I hadn't heard of it before but a Duck Duck Go search only throws up a blog post discussing the desirability for such a feature in git.
  - periodontal 4 years ago
    
    You can get something somewhat similar with "always create merge commit" workflows (no fast forward) and changing your tooling to look at the first-parent-history by default. This view will have one commit per merge, but you can choose to follow the second+ parent history for a given commit to see what went into it.
    
    globular-toast 4 years ago
    
    Or just include a group name in the commit message. No need for empty merge requests in your history. Since most people use issue tracking systems, just prepend the issue number to each commit message in the group.
nuerow 4 years ago

> This discourages thoughtful and frequent commits that express the intent of a change because all the commits are just smashed together anyway so why bother.
This is only the case if said squashing just bundles commits without context or consistent logic. If merges to a mainline branch consist of feature branches whose pull request was already approved after a couple of iterations then the end result is a cleaner commit with it's history thoroughly audited. In practice it's equivalent to a fast-forward merge of a single-commit feature branch that just happened to be nearly lined up with mainline.
- astrobe_ 4 years ago
  
  Agreed. This is when you believe that your program should at the very least compile (or pass tests) at any point in the history. In this case a commit must be a consistent and related set of changes.
  In other words, a commit to us is sort of like an "atomic" change, something that cannot be split or else more or less bad things happen.
  I have trouble conceiving a better way to use Git when you really care about the readability of your history. in some cases I don't care about readability though. On hobby projects I sometimes use Git more like a file transfer and synchronization tool. In this case I don't give a huck about how the history looks like.
  Just like with code, the more readable this history is (in terms of what features/fixes are in there at some point in time), the better.
  - rectang 4 years ago
    
    > This is when you believe that your program should at the very least compile (or pass tests) at any point in the history.
    I only expect that at merge commits, which I can see with `git log --merges`.
    
    bonzini 4 years ago
    
    Why would you? Linux (and any other C or Rust open source project I have worked on) compile and work at any commit.
    
    rectang 4 years ago
    
    Most of the time, that's what I expect. However, sometimes when proceeding step-by-step through a large refactor or a large feature addition, the codebase may be left temporarily in an incomplete state.
    My preference under such circumstances is to favor clarity of commit history, and leave the step-by-step commits intact — with the requirement that they always be located in a feature branch behind a merge commit.
- Vinnl 4 years ago
  
  I mean, it does? If you rigidly turn every feature branch into a single commit, that means that also applies to feature branches that would be thoughtfully crafted into multiple clean commits. (Note: that is not the same as random fixup commits with code review iterations.)

beermonster 4 years ago

> Using git can be daunting at first. Like my good friend Chris once said, "everyone knows the happy path but the minute it gets hairy we're all screwed".

Sounds familiar. Lots of people just learn a few template use-cases, but don’t take the time to learn/study the tool. You might argue you shouldn’t need to but Git isn’t intuitive and does require investment. It comes with its own terminology; there’s no point just guessing what a branch is for e.g..

Like all powerful *nix cli tools, if you don’t you’ll shoot yourself in the foot one day. Even if you want to use a GUI interface, it’s no substitute for learning how git works behind the scenes.

Sometimes when I offer to help a colleague that’s got themselves into knots the sad thing is often they can’t even explain the problem they got into.

jareklupinski 4 years ago

> "everyone knows the happy path but the minute it gets hairy we're all screwed". Sounds familiar.
"Everybody has a plan until they get punched in the mouth."
-Mike Tyson
KronisLV 4 years ago
> Like all powerful *nix cli tools, if you don’t you’ll shoot yourself in the foot one day.
Surely we can do better than this? A tool being powerful should be no excuse for it having footguns!
For example, see "Why SQLite Does Not Use Git": https://sqlite.org/whynotgit.html
In my eyes it addresses many of the problematic bits of Git and offers alternatives to them. Admittedly, it was sad to see other VCSes like SVN die out, since there were some things that they did more clearly than Git (e.g. revision numbers), despite their other shortcomings.
Currently, for many Git is essentially forced upon them and they have to learn commands that they don't understand to use it, much like you said. If version control were easier and more intuitive, then more people would adopt it and the ones using it would be more efficient with it, rather than people messing up their repositories and others blaming them for it - even though they'd be right, that sort of elitism also doesn't help much, since clearly some mistakes are easier to make than others.
I think that most of the systems out there will eventually be improved or rewritten until finally they are both powerful and usable, even Git probably isn't immune to this. Whether the incentives to do that are there now (since GitHub, GitLab and others are already de facto within the industry) will only affect whether this is done in the next 50 or 100 years.
Here's an issue that we ran into this week:
```
  - at work, we have a "main" branch and a "development" branch
  - a new "feature" must be based off of "main", but eventually moved into "development", since that's what the test environments are configured against
  - thus, we have a "feature-development" branch, into which we merge "feature", to not sacrifice our ability to put "feature" back into "main" without the "development" changes, if ever needed
  - then, we merge the "development" branch into the "feature-development" branch to solve any conflicts or do any refactoring that we need to ensure that those two branches play together nicely, before merging "feature-development" into "development"
  - this week, someone did the opposite, they merged "feature-development" back into "feature", thus moving all of the "development" changes into "feature", which we can't allow
  - the solution? who knows, one option would be to force push the earlier position of a branch, thus erasing the merge commit, but force pushes are considered a bad practice
  - what we opted for in the end was having another commit that reverts the merge (through the GitLab UI), however now the history still shows that "feature" has 100+ commits in it
  - the problem that we might run into down the road now is that when we merge "feature" into "feature-development", we'll probably also carry over the revert commit, which we'll then need to revert or do something so that the "feature-development" branch doesn't have all of the "development" changes essentially removed (which may or may not be true)
```
All of that pain, essentially caused by one bad merge, whereas other colleagues are now also asking me about rebasing. I don't feel like i know Git well enough to be the "go to guy" and would rather just write the code i need for the features, rather than worry about the branching strategies and what colleagues have done and weird client requirements for what branches must be used as a base since there are not enough test environments. Furthermore, GitLab not having an easy way to say "i want this commit gone" is equally annoying. If they offer you to do reverts, surely they can also ensure the equivalent of a force push?
Then again, chances are that it's too early for me to say anything vaguely accurate and even so someone out there probably can solve all of it in a line or two. Nonetheless, it feels to me that an easy to use tool would expose solutions for the most common problems in an approachable way.
- masklinn 4 years ago
  
  > Surely we can do better than this? A tool being powerful should be no excuse for it having footguns!
  Of course we can but they all lost the vcs wars so we’re left with git, a chainsaw you can’t shut down with hooked spikes bolted to the handle and an old “handle with care” sticker long gone unstuck and slipped under one of the shop’s cabinets.
  - leokennis 4 years ago
    
    Fantastic analogy.
    Personally I am not interested to learn "proper git". It's plumbing and not an end goal.
    When I start working on something, I do a pull and then create a branch. Then I make edits to the files I need and save a copy of them locally.
    If then something goes awry with my commit/push/MR/whatever, I delete the repo, clone it, create a new branch, using human intelligence re-apply my changes from the local copies and try again.
- xorcist 4 years ago
  
  > GitLab not having an easy way to say "i want this commit gone"
  To be fair to GitLab, that would just be another way to say "force push".
  Either you want the change gone from history, which makes it necessary to coordinate with everyone who has taken out a branch during that time (which is a force push) or you want the merge-and-backout to be visible in eternity (which is a revert commit). There's no way to have both, by definition.
  The way to avoid screwups, in any version control system, is to have everyone actually read what they are about to persist to our shared history.
  The by far most popular process to enforce that is code review. Any pull request which pulls in hundreds of unrelated commits hopefully won't get accepted. And if it does, there'll be plenty of time to evaluate why while coordinating the cleanup.
- zibzab 4 years ago
  
  > the solution? who knows, one option would be to force push the earlier position of a branch, thus erasing the merge commit, but force pushes are considered a bad practice
  This is definitely a better solution and the _only_ time force push is okay.
  (Well unless this is a big public repo that have since moved on)
- beermonster 4 years ago
  
  > Surely we can do better than this? A tool being powerful should be no excuse for it having footguns!
  I agree. In fact I’d even say they exist but yet they didn’t win.
  I have often wondered why a tool that sits atop of git doesn’t exist. One that exposes some basic opinionated operations.
  - KronisLV 4 years ago
    
    > I have often wondered why a tool that sits atop of git doesn’t exist.
    Well, some software like SourceTree allows making certain workflows, like Gitflow easier: https://weaintplastic.github.io/web-development-field-guide/... and https://www.atlassian.com/git/tutorials/comparing-workflows/...
    Of course, that workflow is now considered a bit dated at this point and is only good for some projects, as a way to manage their complexities.
    Now, my memory might be failing me, but i actually recall GitKraken having a lot of nice functionality, like being able to get rid of bad branch merges with ctrl+z, much like one would undo a bad text edit in other software, which seemed really intuitive and functional: https://www.gitkraken.com/
    Personally i'm only aware of such small, yet nice improvements in the GUI space, like how GitHub Desktop also attempted simplifying working with Git (within their platform) somewhat: https://desktop.github.com/
    I guess that perhaps a larger amount of people who use the Git CLI are okay with it or just tolerate it, rather than the ones in the Git UI app space, where new apps pop up every now and then?
  - dude187 4 years ago
    
    Phabricator is like that, kind of a git repo host that comes with an opinionated command line tool that crafts your commits for you.
    Personally, as someone familiar with git I didn't like the abstraction layer on top. However, it's a nice self hosted alternative to something like GitHub
- rvdginste 4 years ago
  
  The way you setup branches and how you work with them, is your own choice and is not directly tied to a specific version control system. I used to work with feature branches on subversion before I moved to git. Git does support more types of branch configurations and more ways of bringing code from one branch into another branch.
  I believe that the issue that you ran into would have been best solved with a reset and a force push, as long as you see it rightaway and thus before other commits were done and as long as you can warn the whole team that an error happened and that it will be fixed with a reset+force push. At least that's the way I would do it. We use bitbucket and disable reset+force push on the main branch, but allow it on feature branches. When it is really needed, we temporarily enable it on the main branch to do a fix and immediately disable it again.
  Also, IMHO, if you have a main and development branch, I think that the feature branches should be based on the development branch and not on the main branch. Normally you want the feature branch to take into account the most recently merged code. To me it seems that if you want to make a new release of 'main' before the release of 'development', you are talking about a hotfix and that should be exceptional. For a hotfix, you'd base the fix off the version with the bug, and then check whether the fix is also needed on development. Moving commits around like that with git can be done fairly easily using 'cherry-pick'. But mainly my point is: all the different branches that you define and how you work with them is really your own choice, you can make it as hard or as simple as you want. It seems a lot of people are making that setup very complex, but remember that not everyone is working on a project used by 10 customers with 3 active major versions, each of which requiring features and bugfixes. In a lot of cases, you can get away with one long-living main branch and short-living feature branches. As long as each release is properly tagged, you can make a bugfix or hotfix release for anything that you released before. And, again in my opinion, the default should be that when a feature is merged, it means that it was ready for release and will be in the next 'regular' release.
  > Surely we can do better than this? A tool being powerful should be no excuse for it having footguns!
  I don't really understand what you mean here. Git is very powerful and allows you to manipulate commits in a lot of different ways, some of which are destructive... but that is part of its power. What are in your opinion the footguns and how would you remove them?
  What I did notice, is that a lot of people just seem to refuse to take the time to actually learn how git works and how they can work with it. A lot of people seem to be convinced that it is a much better use of their time to search for a git UI that hides all the complexities and shows them a more simplified presentation that likely is a bit closer to how they are used to work with other source control systems, than that they use the same time to learn to use git. Personally, I use the git tools that come with git: the git cli, gitk to visualize history and git gui to commit work. (I do like the UI built into JetBrains IDEs because it is pretty good and doesn't try to hide anything or invent a new name for certain git terms.)

rom1v 4 years ago

> We therefore squash and merge our feature branches onto dev when a PR is opened and merged. Regardless of how many commits a feature branch has, once it is merged, all the commits are squashed into a single commit

So several steps to implement a feature are mixed up into a single change :/ That looks awful to me, when you need to read the history to understand how and why.

josephg 4 years ago

The problem in my mind is that we’re overloading what a “commit” means. There’s two meanings: either it’s a description of what happened, by some author, at some point in history. Or it’s an atomic feature being added to the codebase.
Commits are the first thing by default. If you squash and rebase them, they become the second thing (feature changes). But in doing so you throw out information about the original history of those changes.
This has always seemed silly to me. Git should just be changed to support both work flows. Keep commits as-is but let me mark a range of commits as being part of feature X, and let me browse the repo from the perspective of those larger feature objects. Or make it so merge / squash commits still reference the individual commit sequence, so I can see what happened in more detail if I want to.
- bonzini 4 years ago
  
  It already does, using "git merge --no-ff" and "git log --first-parent".
reilly3000 4 years ago

I’m an imperfect human, but I don’t always see the value in having 5 commits that say “fixes lint error” and another few “fixes typo” in the git log for every feature branch. Perhaps there is a middle way.
- ByThyGrace 4 years ago
  
  > I don’t always see the value in having 5 commits that say “fixes lint error”
  That's a problem I avoid by spicing the log with commit messages from http://whatthecommit.com/ ! :)
- xorcist 4 years ago
  
  There are also technical reasons to separate commits. Moving files is best done in a separate commit, otherwise --follow might not work as you might expect. Squashing that together with changes might very well break that logic.
  Always rebase before pushing for review, just don't squash everything. One would hope that much would be self evident.
- rom1v 4 years ago
  
  A PR may contain several distinct meaningful related commits, this has nothing to do with typo fixes.
  But even then, when you want to revert or cherry-pick a commit on another branch, you don't want random unrelated changes (which increase the risk of conflicts).
  Also, a single commit containing 10 squashed commits doesn't help for git bisect.
- ivalm 4 years ago
  
  Would you be ok with history rewriting in feature branch prior to merge? Seems like potentially best of both worlds.
  - rectang 4 years ago
    
    I do minor history rewrites on my feature branches all the time.
    1. Fix typo or other small nit.
    2. `git add -p` to add only the small change.
    3. `git commit -m fixer`
    4. `git stash`
    5. `git log --oneline main..` and copy the SHA of the commit I want to fix.
    6. `git rebase -i SHA~`
    7. In the text editor launched by `rebase`, move the "fixer" commit after the one I want to modify.
    8. For the `fixer` commit, change "pick" to "fixup". Save and quit, allowing the rebase to complete.
    9. `git stash pop`
    This works most of the time, so long as the nit I want to fix isn't too far back in history. For those nits that aren't easy to fix with the workflow above, I just create an ordinary commit and leave it. It's annoying to have 5 "fixes lint error" commits in every feature branch, but it's fine to have them every once in a while.
    
    vtbassmatt 4 years ago
    
    `git commit --fixup` and `git rebase --autosquash` automate a bit of the drudgery for this workflow. Sharing because I just got introduced to them recently.
- dakom 4 years ago
  
  The middle way would be if git tracked branches historically.
  That would fix a lot of problems imho
- KronisLV 4 years ago
  I guess that depends on how detailed the commit messages are and how many changes they have, which will be affected both by the specifics of the project and the people in it.
  Consider the following:
  [8 files changed] ISSUE-541 added DB migrations and repositories for the new functionality of managing Foos [2 files changed] ISSUE-541 refactor the DB migrations to do Bar before creating one of the views because of it breaking otherwise due to Baz [11 files changed] ISSUE-541 refactor old services to use the new Foo functionality, because it should be more consistent than using native queries with complex SQL [3 files changed] ISSUE-541 fix problems with fetching data in some of the old services, because edge cases were not covered, add unit tests to cover this functionality [7 files changed] ISSUE-541 add services for managing Foos, though they will only be used in ISSUE-544, add comments about this [20 files changed] optimize the imports in the Foos package and remove unused code (should automate this later)
  And then the following:
  [3 files changed] DB migrations [1 file changed] refactor [4 files changed] refactor [1 file changed] fix code, add tests [2 files changed] services [6 files changed] formatting
  In one of the cases, the changes are larger and therefore more meaningful descriptions of what each commit does are probably a good thing - if the refactoring of code would break anything that wouldn't be immediately apparent but would later be detected, being able to click in the commit in the IDE history and see what exactly was changed as a part of it could be pretty useful!
  Whereas in the other case, there are fewer files changed, so the pattern that i've seen emerge is that most people won't care much about detailed commit messages, because of which it is no longer that useful.
  Of course, i've also seen people (the majority of my coworkers, though it depends on the company) not really care about commit messages or even making smaller atomic commits at all, thus a feature branch could look like the following:
  [28 files changed] ISSUE-541 [11 files changed] refactoring
  Then again, the majority of people in my current company also always leave merge request descriptions completely empty and expect you to figure out what the code does by its contents alone in the diff, without providing any context or further considerations. Personally, i'm against this and while i don't want to sour our relationships by nagging about it, personally i always write out a bit of information about what each feature branch accomplishes, as well as even include a few images or GIFs.
  That has made my own life much easier when something seemingly breaks 9 months down the line and no one has any idea why, whereas i can just look at the merge request for the offending code and see all of the charts and explanations i need.
  Personally, i think that the closer your documentation (of any sort) is to your code, the better the end results.
  - Izkata 4 years ago
    
    > In one of the cases, the changes are larger and therefore more meaningful descriptions of what each commit does are probably a good thing - if the refactoring of code would break anything that wouldn't be immediately apparent but would later be detected, being able to click in the commit in the IDE history and see what exactly was changed as a part of it could be pretty useful!
    This has happened to me when tracking down bugs a whole bunch of times in our various svn codebases, where commits can't be squashed like in git. Once it even happened in a linting commit when someone accidentally messed up indentation in python.

globular-toast 4 years ago

I would strongly encourage getting rid of the dev branch. It is redundant in almost all cases.

Instead think of the master branch as the integration branch. This is where everything gets merged ready for a new release. But the master branch itself is not a release. You can automatically deploy the master branch to a staging or "next" environment if you wish.

For releases, use tags. That's what they are for. If necessary you can make one or more "maint" branches where you backport important fixes from the master branch on to release branches to create patch level release versions.

cjpearson 4 years ago

Just to add on to this, I've found using dev/master or any other two-trunk setup also tends to encourage bad habits. One branch is seen as the 'production-ready' trunk while the other becomes 'broken-or-incomplete-code-ok' trunk.
When it comes to releases maintenance branches like you suggest are a lot more flexible and allow you to support old versions. Releasing v1.1 after v2 doesn't really work if all your releases have to be a merge to master.

cryptonector 4 years ago

Sorry, but no, a branching workflow won't scale if you have thousands of developers pushing to the same repo. Those pretty `git log --all --decorate --oneline --graph` images won't work when you've got thousands of branches, and the history on the mainline will be poor and of low utility.

What I was taught almost 20 years ago at Sun Microsystems (RIP) is a rebase workflow. That was back before Git. We were using Teamware of all things, and we were using it with a clean, linear upstream history workflow.

Our particular rules were:

  - linear history upstream
  - absolutely no merge commits (at Sun
    they were called "merge turds")
  - one commit per-bugfix, though one
    commit could fix more than one bug,
    with a separate commit for test
    changes
  - one commit per-project
  - but otherwise one push could push
    many commits

We also had rules about commit titling, naturally.

Sun had been using that workflow since 1992 as I recall, so they used that workflow for 16 years, with several thousand developers pushing to OS/Net (core of Solaris).

We even had rebase --onto.

The workflow for projects went like this:

  - devs push to a project clone of the upstream
  - gatekeeper takes care of build issues and
    preps to push to upstream when project is ready
  - every so often the project repo ("gate") would
    rebase onto the upstream head, then devs would
    rebase their clones onto the new project head
  - eventually the project would rebase onto and
    push to the upstream
  - project repos ("gates") got archived when the
    projects completed

That workflow scales very very well. The resulting upstream's history is very clean, with just: commits for bug fixes, commits for tests for bug fixes, commits for projects, commits for release-making, and the occasional follow-up to a commit that fixed minor issues with that commit (e.g., `12345 Crash in blah blah (fix style)`).

I strongly recommend it.

hnrodey 4 years ago

Man, I really feel for people who struggle with Git. I should know - I was one of them many years ago.

Many day-to-day problems with Git all have something in common which is that the user has got their repo in to some state they do not understand. And most times (as comments have mentioned) they don't even know what they did to get themselves in their current pickle.

The quickest path to resolution is usually a hard reset to the server version of the branch then restore the commit(s) that user had made.

If you have pending changes then handle those first. It doesn't matter much what they are just commit or stash. If committing don't even think about the message just use `wip commit`. Finally, note every commit id that has your work you want to keep around. Might be 1, 5 or 20 commits.

Okay, now hard reset.

`git reset --hard <remote_name>/<branch_name>`

Great, now you're at an acceptable state.

Finally, restore your previous commits use cherry-pick or `git stash apply` to get your stuff modified.

If cherry pick then it's `git cherry-pick <sha_oldest> <sha_next_oldest> ....`

On to the next problem...

zibzab 4 years ago

Here is a crazy idea:
1. Commit your your local changes, squash if you want.
2. Do a pull --rebase
3. Push
If this doesn't work, it's your own fault. Start over.

kazinator 4 years ago

> As of git 2.11 one can select a specific stash to be popped instead of just the latest stash using git stash apply n where n is the stash number.

You can pop or just apply a specific stash using

  git stash { pop | apply } stash@{N}

This has worked as far as I remember; I have a 1.6 installation somewhere where I can confirm it, if necessary.

xorcist 4 years ago

Those commit messages leaves a lot to be desired.

"Changes print messages again"

How helpful is that? In what way were they changed? What was the intention behind the change? Is there a ticket or a feature request behind it? Why was that particular change chosen instead of all the other ways to achieve the same effect?

It's also considered prudent to keep to a common format. These are even written in differing tense, where imperative is by far the most commonly preferred.

bazhova 4 years ago

Using dev and master branches is anti pattern and does not scale. Soon your QA team wants a QA branch. Then business wants a branch for demos. The DevOps ... Etc. Use tags. You only need one branch called "main". The way open source projects do it is best, look no further!

mdaniel 4 years ago

I would guess that pattern comes from the git-flow diagram: https://datasift.github.io/gitflow/IntroducingGitFlow.html#h...
WorldMaker 4 years ago

Agreed. It hurts environment reproducibility. It's best when the same binaries flow from environment to environment. That way you know that the binary you built and was tested by QA in the QA environment (not branch) is the exact same build moving to Production, not some entirely fresh rebuild possibly affected by unintentional merge conflicts or hidden integration problems.

yboris 4 years ago

My favorite git-related thing is `diff2html` so I set up an alias `diff` which will open the browser and show me all the changes I've made to the branch:

https://diff2html.xyz/

curious22 4 years ago

could someone tell me which is the git UI client shown in the screenshots of the blog post? https://riskledger-website-media-uploads.s3-eu-west-1.amazon...

revskill 4 years ago

I never touch the git rebase at once in my 10 years of programming. It's born to solve a problem little people face ? I'm not sure though.

3np 4 years ago
How do you make your commit history clean and reviewable when submitting a PR? How do you keep your local branch up-to-date with upstream without a mess of merge commits?
I use it continuously throughout any workday, if I'm doing contributions that day.
It's also a good idea to set your pull strategy to rebase, as recommended here: https://sdqweb.ipd.kit.edu/wiki/Git_pull_--rebase_vs._--merg...
```
  git config --global pull.rebase true
```
- KronisLV 4 years ago
  
  > How do you make your commit history clean and reviewable when submitting a PR?
  I've seen cases where people don't care in the slightest about the list of commits in any PR/MR, but instead just look at the diff view to see the code changes.
  At least that's how it's been in every single project that i've ever been a part of. Most also don't use rebasing at all and don't do squashing commits into a single one for that particular PR/MR either.
  That said, there is probably merit to using rebase and squashing as well, though many fail to see it or gain anything useful from it.
- WorldMaker 4 years ago
  
  Merge commits are quite navigable. We have DAG traversal as basic interview questions all the time in this industry for multiple reasons. They are only a mess if you call it that and generally refuse to use traversal tools such as --first-parent.
blowski 4 years ago

On large monolithic repos with a lot of people committing features independently, rebase helps reduce the noise in the history. If you mostly work on small repos (e.g. microservices) or repos with few contributors, then it makes sense you haven't needed rebase.
- revskill 4 years ago
  
  Not quite. We have a workflow to resolve conflict though. All needed is just git merge, git pull. You need workflow, not git rebase.
  - blowski 4 years ago
    
    A lot of very smart people disagree with you.
  - bonzini 4 years ago
    
    Then you must have never used "git bisect"? Or only bisect on the first parent?

cebert 4 years ago

I’m surprised so many organizations use the term “master” for the primary branch instead of the more inclusive “main”.

patrick451 4 years ago

I'm glad so many haven't wasted time pandering to this woke nonsense.

Settings

Git techniques

Keyboard Shortcuts