Server-side flight recorder for PostgreSQL. Answers "what was happening in my database?"
pg_flight_recorder continuously samples PostgreSQL system state in the background via pg_cron -- no external agents, sidecars, or polling required. It captures wait events, active sessions, locks, WAL activity, checkpoints, I/O, table and index stats, query performance, replication state, and configuration changes. When something goes wrong, the data is already there.
Architecture
Flight Recorder collects two types of data:
| System | What it captures | Frequency | Retention |
|---|---|---|---|
| Sampled Activity | Wait events, sessions, locks | 1 min | Ring buffer: 2h, Archives: 7d |
| Snapshots | WAL, checkpoints, I/O, tables, indexes | 1 min | 30 days |
Data flows through ring buffers (hot, low-overhead) into durable archives and aggregates (cold, long-retention). Safety mechanisms -- circuit breaker, load shedding, per-section timeouts, and pg_cron job timeouts -- prevent the recorder from impacting production workloads.
Extensions
Two extensions, each published as a separate dbdev package:
| Extension | Schema | Purpose | README |
|---|---|---|---|
| pgfr_record | pgfr_record |
Core: tables, collection, scheduling, ring buffers | pgfr_record/README.md |
| pgfr_analyze | pgfr_analyze |
Optional: reporting, anomaly detection, time travel | pgfr_analyze/README.md |
Requirements
- PostgreSQL 15, 16, 17, or 18
pg_cronextension- Superuser privileges for installation
- Optional:
pg_stat_statementsfor query-level analysis
Quick start
Download from GitHub Releases or clone the repo, then:
# Install core + optional analysis extension
psql --single-transaction -f pgfr_record/install.sql
psql --single-transaction -f pgfr_analyze/install.sql-- Enable collection SELECT pgfr_record.enable(); -- Check health SELECT * FROM pgfr_record.health_check(); -- Generate a diagnostic report SELECT pgfr_analyze.report('1 hour');
Common workflows
Daily monitoring
SELECT * FROM pgfr_record.health_check(); SELECT pgfr_analyze.report('1 hour');
Incident response
-- Switch to high-frequency collection SELECT * FROM pgfr_record.apply_profile('troubleshooting'); -- What was happening at a specific time? SELECT * FROM pgfr_analyze.what_happened_at('2024-01-15 14:32'); -- Reconstruct an incident timeline SELECT * FROM pgfr_analyze.incident_timeline( '2024-01-15 14:00'::timestamptz, '2024-01-15 15:00'::timestamptz ); -- Return to normal after incident SELECT * FROM pgfr_record.apply_profile('default');
XID / MultiXID wraparound monitoring
-- Current XID and MultiXID ages at database level (from the latest snapshot) SELECT datfrozenxid_age, datminmxid_age FROM pgfr_record.snapshots ORDER BY captured_at DESC LIMIT 1; -- Top 10 tables by XID or MultiXID age SELECT relid::regclass, relfrozenxid_age, relminmxid_age FROM pgfr_record.table_snapshots WHERE snapshot_id = (SELECT max(id) FROM pgfr_record.snapshots) ORDER BY greatest(relfrozenxid_age, relminmxid_age) DESC NULLS LAST LIMIT 10; -- Wraparound anomalies (XID + MultiXID, cluster + per-table) SELECT anomaly_type, severity, metric_value, recommendation FROM pgfr_analyze.anomaly_report(now() - interval '1 hour', now()) WHERE anomaly_type LIKE '%WRAPAROUND%'; -- Tune thresholds (lower warning ratio to alert earlier on busy clusters) INSERT INTO pgfr_record.config (key, value) VALUES ('xid_warning_ratio', '0.25'), ('mxid_warning_ratio', '0.25') ON CONFLICT (key) DO UPDATE SET value = EXCLUDED.value;
See REFERENCE.md for the full config key list and rationale (postgres-howto #0044 guidance).
Performance analysis
-- Find performance regressions SELECT * FROM pgfr_analyze.detect_regressions('1 day'); -- Find query storms SELECT * FROM pgfr_analyze.detect_query_storms('1 hour'); -- Table hotspots SELECT * FROM pgfr_analyze.table_hotspots(now() - '1 day', now()); -- Unused indexes SELECT * FROM pgfr_analyze.unused_indexes('7 days');
Capacity planning
SELECT * FROM pgfr_analyze.capacity_summary('7 days'); SELECT * FROM pgfr_analyze.quarterly_review(); SELECT * FROM pgfr_analyze.capacity_dashboard;
Configuration profiles
Profiles are pre-configured settings for different environments:
| Profile | Sample Interval | Use Case |
|---|---|---|
default |
60s | General purpose monitoring |
production_safe |
300s | Production with maximum safety margins |
development |
60s | Staging and development |
troubleshooting |
60s | Active incident response |
minimal_overhead |
300s | Resource-constrained systems |
SELECT * FROM pgfr_record.list_profiles(); SELECT * FROM pgfr_record.explain_profile('production_safe'); SELECT * FROM pgfr_record.apply_profile('production_safe');
Safety
Flight Recorder includes automatic protections:
| Protection | Description |
|---|---|
| Circuit Breaker | Skips collection if recent runs averaged > 1s |
| Load Shedding | Skips collection when > 70% connections active |
| Section Timeouts | Per-query timeout (250ms) prevents catalog lock hangs |
| Job Timeouts | Outer statement_timeout on pg_cron collector jobs (500ms–60s) |
Collection modes provide manual control: normal, light, emergency.
-- Reduce to minimum collection (300s sampling, locks/progress off) SELECT pgfr_record.set_mode('emergency'); -- Full stop: unschedule all pg_cron jobs SELECT pgfr_record.disable(); -- Resume SELECT pgfr_record.enable();
Export
With default retention: ~2.5GB uncompressed, ~150MB compressed.
# Without compression pg_dump -d your_database -n pgfr_record --data-only -f pgfr_data.sql # With compression (PostgreSQL 16+) pg_dump -d your_database -n pgfr_record --data-only --compress=gzip:9 -f pgfr_data.sql.gz # With compression (PostgreSQL 15) pg_dump -d your_database -n pgfr_record --data-only | gzip > pgfr_data.sql.gz
Upgrade
Re-running install scripts is safe -- they use CREATE OR REPLACE and IF NOT EXISTS, updating functions and views while preserving all data.
psql --single-transaction -f pgfr_record/install.sql psql --single-transaction -f pgfr_analyze/install.sql
Uninstall
# Remove everything (stops jobs, drops all schemas and data) psql --single-transaction -f pgfr_record/uninstall.sql # Remove only reporting functions (keeps core + data) psql --single-transaction -f pgfr_analyze/uninstall.sql
Testing
./test.sh # Test all PostgreSQL versions in parallel (requires Docker) ./test.sh 17 # Test a specific PostgreSQL version (15, 16, 17, or 18)
Reference
See REFERENCE.md for the full function reference, table schemas, configuration settings, and detailed documentation.