It occurred to me the other day that Google Docs' revision history has a lot in common with version control repos like Git, and it might be useful to actually generate a Git repo from a doc so you can use your preferred Git tools to inspect the history and changes.
Some tasks, like finding out who changed a certain line, are vastly easier with a Git repo (simply git blame) than in the UI provided by Google Docs.
Fortunately the revision history is easily accessible using the Google Drive API, although it's a bit of a hassle doing the OAuth dance.
I've built a quick web app which does all this, creating a Git repo in your browser with a commit for each revision with the correct time and author: doc2git
If you'd prefer to do the OAuth setup and run it yourself then you can check out doc2git on GitHub.
Roughly, the steps are:
- Sign in with Google and OAuth to get drive.readonly scope. This is done client-side so the access token isn't saved anywhere, Google's OAuth library updates the Drive API client automatically.
- Create a Git repo (using isomorphic-git)
- Retrieve all the revisions for the selected doc. Each revision has a set of URLs which can be used to "export" the doc at that revision. I'm currently just retrieving the text/plain version since that's easiest to diff, but there also other formats like text/rtf and even application/pdf. The content of this revision is downloaded from the export URL, saved as doc.txt, and then committed with the author name, email address and time from the revision.
- Create a zip file of the repo (using zip.js) including the .git directory
- Provide a download link to the zip file
It can take a while to process the revisions for docs with lots of history. This could be made a lot faster by downloading several revisions at once, but of course would have to be put into order again for the Git commits to make sense. I haven't bothered for now, happy to leave it running and make a cup of tea in the meantime!