pg-obfuscate is an open-source, developer-first CLI tool that deterministically obfuscates
sensitive data in PostgreSQL databases.
It allows teams to safely share production-like datasets across development, staging, and testing environments without leaking real user data.
pg-obfuscate is designed to be:
- Deterministic - Same input + same config = same output
Example:
Input:
users.email = "alice@example.com"
Config:
seed: 123
strategy: fake:email
Output:
users.email = "mariah.brown@example.org"
Running pg-obfuscate again with the same config will **always** produce the same output for the same input value.
- Schema-aware - Target
publicor custom schemas (e.g.,auth.users) - Scalable - Uses server-side cursors and batch updates for high performance and low memory footprint
- Safe by default - Dry-run mode, confirmation prompts, and integer overflow protection
- Extensible - Multiple obfuscation strategies with precise type casting
How it Works
pg-obfuscate operates in three phases:
-
Plan
- Load config
- Inspect database schema
- Build an execution plan (what will be changed, where, and how)
-
Preview
- Count affected rows per table/column
- Show a human-readable summary
- Make no changes (Dry Run)
-
Execute
- Only when confirmed
- Stream rows in batches
- Deterministically transform values
- Update rows in-place inside per-table transactions
Caution
This tool is inherently DESTRUCTIVE.
pg-obfuscate modifies data in-place. It is designed to be run on clones or backups of production data, never on the live production database itself. There is no "undo" button.
Installation
Quick Start
- Create a config file:
seed: 12345 tables: # Tables default to 'public' schema users: email: fake:email name: fake:name # Access other schemas using schema.table auth.accounts: username: strategy: fake:username consistency_group: user_handles password_hash: hash
- Run with dry-run first:
pg-obfuscate run --db-url postgres://user:pass@localhost/db --config config.yaml --dry-run
Example output:
[OK] Loaded config from config.example.yaml
[OK] Connected to database
Obfuscation Summary
┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━┓
┃ Table ┃ Columns ┃ Rows ┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━┩
│ public.users │ email, username, phone, created_at │ 5 │
│ public.orders │ order_total, tax_amount, status │ 7 │
│ public.profiles │ height │ 5 │
└─────────────────┴────────────────────────────────────┴──────┘
Dry run mode - no changes made.
Error: 0- Execute obfuscation:
pg-obfuscate run --db-url postgres://user:pass@localhost/db --config config.yaml
Performance & Scalability
pg-obfuscate is designed to handle production-scale databases:
- Streaming: Data is streamed from PostgreSQL using server-side cursors, preventing Out-of-Memory (OOM) errors even on million-row tables.
- Batching: Updates are executed in batches (2,000 rows by default) to minimize network round-trips and maximize throughput.
- Type Safety: Automatically detects column types to apply explicit casting (e.g.,
v::timestamp), ensuring compatibility with complex PostgreSQL types. - PK-less Support: Works on tables without primary keys by automatically falling back to PostgreSQL's internal
ctidfor row identification. - Integer Safety: Automatically detects
smallint(int2) andinteger(int4) columns to prevent overflow errors during data generation.
Commands
| Command | Description |
|---|---|
pg-obfuscate run |
Execute obfuscation |
pg-obfuscate validate |
Validate config (optional: check against DB) |
pg-obfuscate --version |
Show version |
Obfuscation Strategies
| Strategy | Description |
|---|---|
hash |
SHA256 hash (text columns only) |
fake:<type> |
Faker-generated data |
null |
Set to NULL |
preserve |
Keep original value |
Consistency Groups
Consistency groups ensure that different columns (even in different tables) produce the same obfuscated output for the same input value. This is essential for maintaining referential integrity across your database.
tables: users: email: strategy: fake:email consistency_group: user_emails newsletter_subs: subscriber_email: strategy: fake:email consistency_group: user_emails
NULL Preservation
By default, pg-obfuscate preserves NULL values. If a source column contains a NULL, the tool will skip it regardless of the strategy (except for the explicit null strategy). This ensures you don't accidentally introduce data into rows that were intentionally empty.
Schema Validation
Before running a destructive obfuscation, you can validate your configuration against the actual database schema to catch typos or missing columns:
pg-obfuscate validate --config config.yaml --db-url postgres://user:pass@localhost/db
This will verify that every table and column listed in your config exists in the database and is accessible.
Supported Fake Types
Text types: email, name, first_name, last_name, phone, address, company, text, city, country, postcode, street_address, job, url, username, uuid
Numeric types: int, number, float, decimal, price
Date types: date, datetime
Safety Features
--dry-run- Preview without making changes--force- Skip confirmation prompt- Confirmation prompt before execution
- Backup warning displayed
- Per-table transactions (rollback on error)
- Integer range enforcement (prevents overflow crashes)
Safety & Backup Guidelines
1. Never Run on Live Production
This tool is intended for creating sanitized datasets for development. Always run it on a restored backup or a database fork.
2. Transactional Behavior (Atomicity)
pg-obfuscate processes tables one by one.
- If an error occurs during the processing of a table, that specific table will be rolled back.
- However, any tables processed before the error occurred will remain obfuscated (committed).
- If the process is killed (e.g.,
Ctrl+C), the current batch may be partially committed or rolled back depending on the exact timing.
3. Recommended Workflow
- Backup: Create a full dump of your database (
pg_dump). - Restore: Restore the dump to a dedicated staging/local database.
- Validate: Run
pg-obfuscate validate --config config.yamlto check for schema mismatches. - Dry-run: Run
pg-obfuscate run --dry-runto see which tables will be affected. - Execute: Run the obfuscation on the restored clone.
- Verify: Check the data to ensure relational integrity and obfuscation quality before sharing with the team.
Exit Codes
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Runtime error |
| 2 | Config validation error |
Environment Variables
PG_OBFUSCATE_DB_URL- Database connection string
What pg-obfuscate Preserves (and What It Doesn’t)
pg-obfuscate preserves:
- Value equality (via consistency groups)
- Referential integrity (PK/FK-like relationships)
- Data types and constraints
- Repeatability across runs
pg-obfuscate does NOT automatically preserve:
- Derived metrics (e.g. revenue - cost = profit)
- Statistical distributions across unrelated columns
- Business semantics between independent numeric fields
License & Commercial Use
This project is licensed under a Dual-Licensing model to support both the open-source community and commercial enterprise needs.
1. Open Source License (AGPLv3)
For individuals, small teams, and open-source projects, pg-obfuscate is available under the GNU Affero General Public License v3.0 (AGPLv3). See the LICENSE file for the full text.
This means:
- You are free to use, modify, and distribute the software
- If you use it in a service or internal tool, you must make your modifications available under the same license
2. Commercial License
For companies that cannot or do not wish to comply with the AGPLv3, we offer a Commercial License that provides:
- Use in proprietary/internal systems
- No obligation to release source code
For licensing inquiries, custom strategies, or commercial quotes, please contact: ofsen@proton.me