Self-hosted PostgreSQL backup manager with BTRFS block-level deduplication.
Consecutive Postgres backups share most of their data. pgdedup stores them uncompressed on BTRFS and uses dduper to deduplicate at the filesystem level — so you only pay for the data that actually changed.
How it works
pgdedup backup
→ runs pg_basebackup
→ stores on BTRFS
→ runs dduper to deduplicate across all backups
→ applies retention policy
No cloud dependencies. No agents. Single binary. Your data stays on your server.
Features
- Block-level dedup via BTRFS + dduper (up to 85% storage savings)
- Point-in-time recovery (PITR) via WAL archiving
- GFS retention — keep N daily, N weekly, N monthly backups
- Single binary — 4.5MB, zero runtime dependencies
- Simple config — one TOML file
Quick Start
# Install (requires Go 1.21+) git clone https://github.com/Lakshmipathi/pgdedup.git cd pgdedup go build -o pgdedup . sudo cp pgdedup /usr/local/bin/ # Initialize pgdedup init \ --db "postgres://user:pass@localhost/mydb" \ --device /dev/sda1 \ --backup-dir /btrfs/backups # Run a backup pgdedup backup # Run another backup, then deduplicate pgdedup backup pgdedup dedupe --dry-run # see savings pgdedup dedupe # actually dedupe # Check status pgdedup status
PITR Support
pgdedup supports point-in-time recovery via WAL archiving.
Setup (add to postgresql.conf):
archive_mode = on
archive_command = 'cp %p /btrfs/backups/wal/%f'
Restore to a specific point in time:
pgdedup restore 2026-03-22_020000 \
--target /var/lib/postgresql/restore \
--pitr "2026-03-22 14:30:00"Commands
pgdedup init Initialize config and backup directory
pgdedup backup Run a PostgreSQL backup
pgdedup restore Restore a backup (supports --pitr)
pgdedup list List all backups (supports --json)
pgdedup dedupe Run dduper to deduplicate storage
pgdedup status Show backup health and storage savings
pgdedup retention Apply retention policy (--apply to delete)
Configuration
pgdedup init creates a pgdedup.toml in your backup directory:
[database] connection = "postgres://user:pass@localhost:5432/mydb" [storage] backup_dir = "/btrfs/backups" device = "/dev/sda1" [schedule] cron = "0 2 * * *" dedupe_after_backup = true [retention] daily = 7 weekly = 4 monthly = 12
Scheduling
pgdedup doesn't include a built-in scheduler. Use cron:
# Daily backup at 2am 0 2 * * * /usr/local/bin/pgdedup backup # Weekly retention cleanup 0 3 * * 0 /usr/local/bin/pgdedup retention --apply
Requirements
- BTRFS filesystem for backup storage
- dduper installed (
/usr/sbin/dduperor in PATH) - pg_basebackup (comes with PostgreSQL client tools)
- Patched btrfs-progs with dump-csum support (see dduper INSTALL.md)
How much storage does it save?
Benchmarked with an 800MB PostgreSQL database, 7 daily backups, uncompressed tar format on BTRFS:
| Workload | Daily change | Storage saved | 7 backups stored as |
|---|---|---|---|
| Append-only (logs, events) | ~1% growth | 85% | ~1 full copy |
| Mixed read/write (SaaS, OLTP) | ~1% updates | 68% | ~3 full copies |
| Write-heavy | ~5% updates | 19% | ~6 full copies |
Savings depend on how much data changes between backups. Append-only workloads benefit the most because existing pages stay identical — only new pages are added. Update-heavy workloads scatter changes across pages, reducing the number of identical blocks.
Important: Backups must be stored uncompressed (pg_basebackup without -z).
Compressed backups cannot be deduplicated. Use BTRFS transparent compression instead:
sudo mount -o compress=zstd /dev/sdX /backups
This gives you both compression on disk and deduplication across backups.
Status
pgdedup is in early beta. It works end-to-end (backup, dedupe, restore verified) but needs real-world testing with larger databases and different workloads.
Looking for beta testers! If you run PostgreSQL on BTRFS (or are willing to try), please give it a spin and report bugs at https://github.com/Lakshmipathi/pgdedup/issues
License
GPL-2.0
Contributing
Issues and pull requests welcome at https://github.com/Lakshmipathi/pgdedup