GitHub - Lakshmipathi/pgdedup: PostgreSQL backup manager with BTRFS block-level deduplication via dduper

3 min read Original article ↗

Self-hosted PostgreSQL backup manager with BTRFS block-level deduplication.

Consecutive Postgres backups share most of their data. pgdedup stores them uncompressed on BTRFS and uses dduper to deduplicate at the filesystem level — so you only pay for the data that actually changed.

How it works

pgdedup backup
  → runs pg_basebackup
  → stores on BTRFS
  → runs dduper to deduplicate across all backups
  → applies retention policy

No cloud dependencies. No agents. Single binary. Your data stays on your server.

Features

  • Block-level dedup via BTRFS + dduper (up to 85% storage savings)
  • Point-in-time recovery (PITR) via WAL archiving
  • GFS retention — keep N daily, N weekly, N monthly backups
  • Single binary — 4.5MB, zero runtime dependencies
  • Simple config — one TOML file

Quick Start

# Install (requires Go 1.21+)
git clone https://github.com/Lakshmipathi/pgdedup.git
cd pgdedup
go build -o pgdedup .
sudo cp pgdedup /usr/local/bin/

# Initialize
pgdedup init \
  --db "postgres://user:pass@localhost/mydb" \
  --device /dev/sda1 \
  --backup-dir /btrfs/backups

# Run a backup
pgdedup backup

# Run another backup, then deduplicate
pgdedup backup
pgdedup dedupe --dry-run   # see savings
pgdedup dedupe             # actually dedupe

# Check status
pgdedup status

PITR Support

pgdedup supports point-in-time recovery via WAL archiving.

Setup (add to postgresql.conf):

archive_mode = on
archive_command = 'cp %p /btrfs/backups/wal/%f'

Restore to a specific point in time:

pgdedup restore 2026-03-22_020000 \
  --target /var/lib/postgresql/restore \
  --pitr "2026-03-22 14:30:00"

Commands

pgdedup init        Initialize config and backup directory
pgdedup backup      Run a PostgreSQL backup
pgdedup restore     Restore a backup (supports --pitr)
pgdedup list        List all backups (supports --json)
pgdedup dedupe      Run dduper to deduplicate storage
pgdedup status      Show backup health and storage savings
pgdedup retention   Apply retention policy (--apply to delete)

Configuration

pgdedup init creates a pgdedup.toml in your backup directory:

[database]
connection = "postgres://user:pass@localhost:5432/mydb"

[storage]
backup_dir = "/btrfs/backups"
device = "/dev/sda1"

[schedule]
cron = "0 2 * * *"
dedupe_after_backup = true

[retention]
daily = 7
weekly = 4
monthly = 12

Scheduling

pgdedup doesn't include a built-in scheduler. Use cron:

# Daily backup at 2am
0 2 * * * /usr/local/bin/pgdedup backup

# Weekly retention cleanup
0 3 * * 0 /usr/local/bin/pgdedup retention --apply

Requirements

  • BTRFS filesystem for backup storage
  • dduper installed (/usr/sbin/dduper or in PATH)
  • pg_basebackup (comes with PostgreSQL client tools)
  • Patched btrfs-progs with dump-csum support (see dduper INSTALL.md)

How much storage does it save?

Benchmarked with an 800MB PostgreSQL database, 7 daily backups, uncompressed tar format on BTRFS:

Workload Daily change Storage saved 7 backups stored as
Append-only (logs, events) ~1% growth 85% ~1 full copy
Mixed read/write (SaaS, OLTP) ~1% updates 68% ~3 full copies
Write-heavy ~5% updates 19% ~6 full copies

Savings depend on how much data changes between backups. Append-only workloads benefit the most because existing pages stay identical — only new pages are added. Update-heavy workloads scatter changes across pages, reducing the number of identical blocks.

Important: Backups must be stored uncompressed (pg_basebackup without -z). Compressed backups cannot be deduplicated. Use BTRFS transparent compression instead:

sudo mount -o compress=zstd /dev/sdX /backups

This gives you both compression on disk and deduplication across backups.

Status

pgdedup is in early beta. It works end-to-end (backup, dedupe, restore verified) but needs real-world testing with larger databases and different workloads.

Looking for beta testers! If you run PostgreSQL on BTRFS (or are willing to try), please give it a spin and report bugs at https://github.com/Lakshmipathi/pgdedup/issues

License

GPL-2.0

Contributing

Issues and pull requests welcome at https://github.com/Lakshmipathi/pgdedup