The second-order-diff Git trick

blog.moertel.com

180 points by malloc47 13 years ago · 43 comments

Reader

stasm 13 years ago

It looks like what you're really looking for is a local branch that you never push to the remote, and 'git diff master...' with 'git rebase'.

Create a local branch, apply the sledgehammer, and start reviewing changes. Use 'git diff master...' to review changes. (This is short for 'git diff master...HEAD' which in turn stands for 'compare current branch's HEAD with the commit on master off of which you've branched.) 'git add -p' and commit changes that you like.

Iterate until you're happy with the result. You will have ended up with a few commits on your local branch. Use 'git rebase -i master' to squash all commit in a single one. Finally, check out the master branch and merge your local branch in, preferably with --ff.

tmoertel 13 years ago

In this case, however, I don't want to use something like `git add -p` to pick the sledgehammer's good effects from the bad. That there are bad effects means the original sledgehammer was wrong. I want to fix the sledgehammer.
Yes, I could accept the good effects and then create a new sledgehammer to attempt to fix the bad effects (without affecting any good effects or any original code that just happens to look like a bad effect), but it's easier and more reliable to just roll back all of the effects, fix the original sledgehammer (tweak the regex), and reapply it to the original clean slate.

DoubleCluster 13 years ago

Why not just commit after the first step? You can then use the normal diff, and afterwards you just remove your temporary commit to make the final one.

tmoertel 13 years ago

> Why not just commit after the first step?
Good question. In answer, there are two reasons:
First, a stash is a commit, just one that doesn't get in your way and that you don't have to keep track of yourself because it's automatically pointed to by a known reference called "stash".
Second, by using "git stash" to create this commit instead of "git commit", you save yourself the (small) burden of moving the commit out of your way and resetting the working tree back to the clean slate you started from – and upon which you want to try a new, slightly adjusted sledgehammer from scratch. That's exactly what "git stash" does in one step:
"Use git stash when you want to record the current state of the working directory and the index, but want to go back to a clean working directory." [1]
[1] http://www.kernel.org/pub/software/scm/git/docs/git-stash.ht...
jzwinck 13 years ago

If you do that, you're just one accidental step away from uploading your bad changes to the origin. The stash way seems a bit safer and more natural to me.
- chii 13 years ago
  
  or, do it on a branch, then merge it into the production branch.

pseut 13 years ago

I think 'git add -p .' would have worked better for the specific example he gave, though. (Step through and stage change by change)

tmoertel 13 years ago

The reason I don't use `git add -p` to incrementally approve the "good" changes is that the sledgehammer is not incremental: it expects to be applied to the original clean slate. Using git stash will get me back to that clean slate. Using `git add -p` will not, for both the approved and unapproved changes will remain in the working tree, where the sledgehammer expects neither of them to be.
- pseut 13 years ago
  
  Getting back to the clean slate is pretty trivial whatever you do, which is one of the nice things about git. I'd probably prefer to have a main (sub) branch that I add to incrementally, apply various sledgehammers to the "clean slate" and rebase the results of the sledgehammer liberally.
ibotty 13 years ago

as i said in my other comment. that will only work for idempotent things: if you like to change, say, ' to ", the workflow in the article works, but git add -p won't.
- pseut 13 years ago
  
  I don't follow you. Any time I'm inspecting the diff to see if I made the change I want, staging the changes i want progressively and then discarding the rest has worked well.
  - ibotty 13 years ago
    
    the important part missing in my example is the 'vice versa'. the canonical example is:
    > sledge() { tr ab ba < $1 | sponge $1 }
    then, you might undo a change you already -p added.
    more convincing might be (using gnu sed):
    > sledge() { sed -i s/identifier/long_identifier/ $1 }
    a second iteration will eventually generate long_long_identifier.

niggler 13 years ago

git stash has other cool features and use cases (for example, when you want to pull from upstream and you have conflicting changes.) http://git-scm.com/book/ch6-3.html is a nice summary and the manpage also gives some sample workflows.

drstewart 13 years ago

I actually saw this as a strike against git usability -- this should happen transparently imo, why should I have to git stash save; git pull --rebase; git stash pop for things to apply cleanly? Just do it automatically and fail if the stash application fails.
- andrewflnr 13 years ago
  
  Because then git would be doing magic stuff you don't (necessarily) understand, and people who like that aren't git's target audience. Those are all separate pieces of functionality that shouldn't be stuck together by default.
  - jzwinck 13 years ago
    
    The same argument could be applied against git pull, which is really the concatenation of fetch and merge (or rebase). I agree with the previous poster that there should be better defaults available to non-tweakers.
    
    andrewflnr 13 years ago
    
    Git pull always seemed weird to me too. I don't use it.
- jahewson 13 years ago
  
  I'd call that "happening opaquely"
- Spiritus 13 years ago
  
  You could just create an alias for that.

tmeasday 13 years ago

Of course the problem here is that the GP can't actually be sure that every link he wanted to fix in the final step was fixed; just that the ones that _were_ fixed were fixed right!

tmoertel 13 years ago

GP here. You're right that the final diff, in this case the second-order diff, cannot by itself prove that my final adjustment fixed all of the broken sentence-end links. But I wasn't merely going on that evidence.
The whole point of using a second-order diff was to allow me to reliably carry forth the knowledge gained by my exhaustive review of the prior diff. That exhaustive review told me that there were a dozen broken sentence-end links. And that's how many showed up as fixed in the final, second-order diff: one dozen.
So the prior and final evidence, together, allowed me to be confident that the adjustment worked as intended.
- tmeasday 13 years ago
  
  Very true, and a good point. I thought it was a interesting little gotcha about the whole technique though: sometimes you will actually need to go ahead and look at whole diff to be 100% sure.
  - tmoertel 13 years ago
    
    Indeed. Whenever you drop the sledgehammer, you have the obligation to exhaustively review its effects at least once to be sure there wasn't collateral damage. The beauty of the second-order diff is that, once you do an exhaustive review, you need not do another one just to adjust the sledgehammer.

gbog 13 years ago

I used a similar technique for a less frivolous task. I had script dumping data in text format, and wanted to refactor it. I committed the result and could refactor brutally step by step, running the script at each step and reverting whenever the dumped data showed some diffs.

ibotty 13 years ago

when i have an idempotent sledgehammer (i.e. i can apply it twice and get the same result) i usually just

   git add -p

everything that is ok. git diff will only show you the differences against the index, so this works as well.

habosa 13 years ago

This is a great tip, thanks! I never thought of using git diff to test things out, and I definitely didn't know you could diff against a stash.

malingo 13 years ago
That was news to me as well. I've always done the following:
```
    $ diff -u <(git stash show -p) <(git diff)
```
- kzrdude 13 years ago
  
  never again.. by the way, `git diff --no-index` also works like a generic diff utility on any two files.
  (Wow, that --no-index behavior seems to be default outside repositories now. I learned something, just git diff works now.)

zacharypinter 13 years ago

I find this alias really useful for accomplishing similar goals:

alias gqc="git commit -m 'quick commit'"

The command is usually preceded by "git add ." (or alias "ga."). Making a commit ends up being more reliable for me than stash. Also, the commit stays with the branch, making it easy to switch to master branch, make or check an important change, then go back to what I was working on in the develop branch. Additionally, it makes rebasing a work in progress easier. Just gqc && git pull --rebase.

When it comes time to push, I can just check the log for all the "quick commit" commits. If there's just one, then I make an amend commit. If there's more than one, I rebase interactively.

I suppose I could add a hook to make sure I never accidentally push a quick commit, but it hasn't been an issue yet (over the past year or so I've only made the mistake once).

twic 13 years ago

Am i missing something, or is this nothing to do with Git? You could do this with any source control tool, or indeed with simple copies of the files. I've used exactly this approach without Git for years.

craigching 13 years ago

> Am i missing something, or is this nothing to do with Git?
Of course it's something to do with git, it describes a workflow in git. Yes, you can do this with other VC tools or even without, but this is how you do it with git and, I'd argue, it's better than other ways, certainly more convenient than without a version control tool at all, IMHO.

emillon 13 years ago

A bit offtopic, but I think that you could have let the default `pandocCompiler` in Hakyll handle your old posts written in Textile format.

tmoertel 13 years ago

I actually tried that at first, but the Typo-flavored Textile in my old posts wasn't reliably interpreted by Pandoc. (That's also why I had to clean up the posts after I used Pandoc to convert them into Markdown.) Since I had manual edits to do in any case, I figured I might as well do them after I converted my posts to Markdown since it seems to be the Pandoc's best-supported markup language.

minhajuddin 13 years ago

Nice tip, My sledgehammer is a git-sub script (http://minhajuddin.com/2011/12/13/script-to-do-a-global-sear...) :)

shurcooL 13 years ago

This is very useful, as the article describes in detail.

I've found it useful to be able do diffs of diffs in my work, hence I'm planning to add the ability to do that to my toolset. Combined with live editing, I think it's going to be quite neat.

shurcooL 13 years ago

Delivered: https://dl.dropbox.com/u/8554242/dmitri/projects/Conception/...

wubbfindel 13 years ago

I tend to 'git commit' and then 'git commit --amend' until I'm happy, then push to others.

I find it easier anyway. But the stash approach is interesting, thanks for sharing!

almost 13 years ago

That's good advice for a lot of situations but it depends if your "sledgehammer" is assuming a certain starting condition.
In the authors example if he got it wrong and it replaced a load of non-links with links then running it again on the output of the first run isn't going to do any good. So he's suggesting replacing "git reset" with "git stash", both reset the repo to the way it was pre-sledgehammer but "git stash" also keeps around the previous results for comparison.
- wubbfindel 13 years ago
  
  Oh I see, because he's using git itself to find the files that need work on them? I had honestly missed that part of the logic.
  That'll teach me to 'scan read' and then comment!

kapuzineralex 13 years ago

Nice one, indeed!

cecilpl 13 years ago

This is the equivalent in Perforce of shelving your change, then redoing your work and diffing against your shelf.

Settings

The second-order-diff Git trick

Keyboard Shortcuts