git-backup: Back Up Your Git Repositories to S3

6 min read Original article ↗

git-backup is a command-line tool for backing up your Git repositories to Amazon S3 or any S3-compatible storage.

Why Choose git-backup?

  • Simple Setup: Get started quickly with minimal configuration.
  • Durable Backups: Creates .tar.gz archives, ensuring you can restore your data without needing specific tools.
  • Automated Cleanup: Includes a built-in prune command to delete old snapshots based on your retention policy.

Installation

Using NPM:

$ npm install @larose/git-backup

Using Yarn:

$ yarn add @larose/git-backup

Creating a Snapshot

Use the snapshot command to create a compressed archive of your Git repository and upload it to your S3-compatible storage. The snapshot command works by executing git clone --mirror <repo>, which captures all commits, tags, and branches. It then compresses the clone into a .tar.gz file and uploads it to S3.

$ git-backup snapshot \
  --repo $REPO \
  --remote $REMOTE \
  --access-key-id $ACCESS_KEY_ID \
  --secret-access-key $SECRET_ACCESS_KEY

Arguments:

  • --repo: The URL of the Git repository you want to back up.
  • --remote: The URL of the remote storage location where the snapshot will be stored.
  • --access-key-id: Your access key ID for the S3-compatible storage.
  • --secret-access-key: Your secret access key for the S3-compatible storage.

Example:

$ git-backup snapshot \
  --repo git@github.com:larose/utt.git \
  --remote https://1234.r2.cloudflarestorage.com/bucket-name/path/in/your/bucket \
  --access-key-id AKIAIOSFODNN7EXAMPLE \
  --secret-access-key wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

Pruning Old Snapshots

The prune command helps you manage storage space by deleting old snapshots based on a defined retention policy.

$ git-backup prune \
  --repo $REPO \
  --remote $REMOTE \
  --retention-policy $RETENTION_POLICY \
  --access-key-id $ACCESS_KEY_ID \
  --secret-access-key $SECRET_ACCESS_KEY

Arguments:

  • --repo: The URL of the Git repository you want to back up.
  • --remote: The URL of the remote storage location where the snapshot will be stored.
  • --retention-policy: Defines how many snapshots to keep for different durations. Format: daily=<number>,weekly=<number>,monthly=<number>. See below for more details on the retention policy.
  • --access-key-id: Your access key ID for the S3-compatible storage.
  • --secret-access-key: Your secret access key for the S3-compatible storage.

Example:

$ git-backup prune \
  --repo git@github.com:larose/utt.git \
  --remote https://1234.r2.cloudflarestorage.com/bucket/base/path \
  --retention-policy "daily=7, weekly=4, monthly=3" \
  --access-key-id AKIAIOSFODNN7EXAMPLE \
  --secret-access-key wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

Retention Policy

The prune command uses a retention policy (--retention-policy) to manage how many snapshots are kept for different durations. This ensures you have enough snapshots for recovery while optimizing storage usage.

Format: daily=D, weekly=W, monthly=M

  • D: The number of most recent daily snapshots to retain. Days start at UTC midnight.
  • W: The number of most recent weekly snapshots to retain. Weeks start on Monday.
  • M: The number of most recent monthly snapshots to retain. Months start on the first day of the month.

If a scheduled backup fails or is skipped, it doesn't count towards its retention window. This ensures you always have at least the intended number of successful snapshots available for each period. This is particularly helpful to avoid situations where a string of failed backups could lead to the deletion of all your snapshots for a specific timeframe.

Retention Policy Example

This example demonstrates how prune works with a policy to retain only the four most recent daily snapshots (daily=4, weekly=0, monthly=0).

Snapshots taken:

  • May 28 (midnight)
  • May 26 (at various times) - We have multiple snapshots for May 26th
  • May 25 (midnight)
  • May 24 (midnight)
  • May 23 (midnight)

The table below shows which snapshots are retained and why:

Snapshots Status Explanation
May 28 at midnight ✅ Retained Most recent daily snapshot
May 26 at 11pm ✅ Retained Second most recent daily snapshot (keeps the latest for each day)
May 26 at 8am ❌ Deleted Older snapshot on the same day (keeps only the most recent per day)
May 26 at midnight ❌ Deleted Older snapshot on the same day (keeps only the most recent per day)
May 25 at midnight ✅ Retained Third most recent daily snapshot
May 24 at midnight ✅ Retained Fourth most recent daily snapshot, reaches the retention limit of 4 daily snapshots
May 23 at midnight ❌ Deleted Exceeds the retention window (policy keeps only the 4 most recent daily snapshots)

Using git-backup with GitHub Actions

While git-backup is a command-line tool, you can leverage GitHub Actions to automate backups for your Git repositories hosted on GitHub. Here's an example workflow demonstrating how to achieve this:

name: Back up Public Repositories

on:
  schedule:
    - cron: "0 0 1 * *" # Runs at midnight on the first day of every month
  workflow_dispatch:

jobs:
  back-up:
    runs-on: ubuntu-22.04

    strategy:
      matrix:
        repo:
          [
            "https://github.com/cicd-excellence/app.git",
            "https://github.com/cicd-excellence/infra.git",
            "https://github.com/larose/cargo.git",
            "https://github.com/larose/conjugueur.git",
            "https://github.com/larose/eef.git",
            "https://github.com/larose/ena.git",
            "https://github.com/larose/git-backup-demo.git",
            "https://github.com/larose/pretty-printer.git",
            "https://github.com/larose/tsp.git",
            "https://github.com/larose/utt.git",
            "https://github.com/larose/verbes.git",
            "https://github.com/larose/yarn-monorepo-change-based-testing-demo.git",
            "https://github.com/larose/wiki.git",
          ]

    steps:
      - name: Checkout repository
        uses: actions/checkout@v4

      - name: Set up Node.js
        uses: actions/setup-node@v4
        with:
          node-version: "lts/*"

      - name: Install @larose/git-backup
        run: npm install -g @larose/git-backup

      - name: Back up ${{ matrix.repo }}
        run: |
          git-backup snapshot \
            --repo ${{ matrix.repo }} \
            --remote ${{ secrets.REMOTE }} \
            --access-key-id ${{ secrets.ACCESS_KEY_ID }} \
            --secret-access-key ${{ secrets.SECRET_ACCESS_KEY }}

          git-backup prune \
            --repo ${{ matrix.repo }} \
            --remote ${{ secrets.REMOTE }} \
            --access-key-id ${{ secrets.ACCESS_KEY_ID }} \
            --secret-access-key ${{ secrets.SECRET_ACCESS_KEY }} \
            --retention-policy "monthly=3"

Source: https://github.com/larose/git-backup-demo

Note that the Git repository URL uses https instead of ssh because, by default, the SSH key provided in a workflow does not have the permission to clone other Git repositories.

If you want to back up private Git repositories, simply use a personal access token (PAT) as the username in the Git repository URL. Example: git clone https://$GITHUB_PAT@github.com/larose/utt.git.

Restoring from a Snapshot

To restore a Git repository from a snapshot created by git-backup, follow these steps:

Step 1: Download the Snapshot

Use the AWS CLI, another S3-compatible tool, or the S3 UI to download the backup snapshot to your local machine.

Step 2: Extract the Snapshot

Use a tool like tar to extract the contents of the downloaded archive. This will create a directory containing the complete mirrored (bare) repository, which is a special type of repository without a working directory.

$ tar -xzf <snapshot-name>.tar.gz

Replace <snapshot-name> with the actual filename of your downloaded snapshot.

Example:

$ tar -xzf larose-utt-20240602T161101Z.tar.gz

Step 3: Clone the Bare Repository as a Regular Repository

The extracted directory contains a bare Git repository, meaning it only holds the Git data (commits, branches, tags) but not your working files.

To convert the bare repository into a regular working directory, use the git clone command, specifying the extracted directory as the source and a new directory for your restored working repository.

$ git clone <extracted_directory_name> my-restored-repo

Replace <extracted_directory_name> with the actual name of the extracted directory and my-restored-repo with your desired name for the restored working directory.

Example:

$ git clone larose-utt my-restored-repo

Your Git repository is now restored and ready to use.

Source Code

Download the source code from this link.