pgGraph Documentation

4 min read Original article ↗

Overview

pgGraph banner

Graph database superpowers for your existing Postgres data.

pgGraph lets you run complex graph queries, like finding shortest paths, mapping relationships, and discovering networks, directly on your existing PostgreSQL tables using standard SQL. You don’t need to migrate your data to a separate graph database, and your app keeps writing data the exact same way it always has. Postgres remains your single source of truth.

Finding Shortest Path

Mapping Relationships

Discovering Networks

pgGraph is currently alpha software for experimentation, demos, benchmarks, and early feedback. The core graph features are usable, but memory hardening for very large graphs and expensive algorithms is still in progress. Run it in Docker or a dedicated development database for now, not on a production or shared PostgreSQL cluster.

Evokoa

pgGraph is built by Evokoa. We are building the missing infrastructure for widespread AI adoption. We are committed to keeping pgGraph 100% free and open-source, forever.

Star our repo to help us reach more developers, and follow our journey below:

How it Works

pgGraph builds a compact, derived graph index over selected relational tables for repeated bounded traversal workloads. Because Postgres stays the system of record, all standard tables, constraints, WAL, MVCC, backups, ACLs, and RLS still apply to application writes.

Core Architecture

PostgreSQL tables stay authoritative. pgGraph is derived acceleration state. Users ask SQL questions; hot loops walk integer arrays.

Postgresidsrcdst1ABpgGraph.pggraph0x000FA1000x04B21C440x080000FF

pgrx Boundary

SQL arguments enter through pgrx, and final rows return through pgrx to PostgreSQL. The engine operates on compact graph data without recursive SQL.

pgrx boundaryPostgreSQL ExecutorSELECT * FROM graph.traverse( 'users', 67);Rust Engine (CSR)node_idx: 6784[12, 19, 7]

CSR: The Hot Path Shape

pgGraph stores edges as compressed sparse row (CSR) arrays. Expansion involves simply slicing flat parallel arrays, avoiding pointers and object chasing.

Hover over a node to see its array slice

0123edge_offsets02355targets12303

Immutable Base, Mutable Overlay

Fast immutable base structure shared by the OS page cache, with bounded backend-local mutable overlays to immediately absorb real-time inserts and deletes.

Base Arrays (mmap){ A->B : +1, A->C : -1 }Mutable OverlayINSERTSELECT

Current benchmarks

Query Performance

Dataset: PANAMA (2,016,523 nodes, 5,802,586 edges)

Cold Run (ms)

Hot Run (ms)

* Cold Run: Docker container restart before each cold query; excludes graph.build(); OS cache may remain warm depending on host

* Hot Run: one unrecorded warm-up pass, then repeated measured SQL in one persistent psycopg PostgreSQL backend

Query Performance

Dataset: LDBC (3,181,724 nodes, 34,512,076 edges)

Cold Run (ms)

Hot Run (ms)

* Cold Run: Docker container restart before each cold query; excludes graph.build(); OS cache may remain warm depending on host

* Hot Run: one unrecorded warm-up pass, then repeated measured SQL in one persistent psycopg PostgreSQL backend

Machine Info: macOS-26.4-arm64-arm-64bit | 14 CPUs | Captured: 5/12/2026

Current Scope

Currently Supported

PostgreSQL 13-18

Compatible with PostgreSQL 13 through 18. Primarily tested on PostgreSQL 17.

Schema Registration

Register tables and edges manually, or use auto-discovery for fast setup.

SQL-backed Search

Search over registered columns using contains, exact, prefix, and token modes.

Graph Traversal

Bounded BFS and DFS with depth, frontier, filtering, pagination, and circuit breakers.

Shortest Path & Components

Standard and weighted shortest paths, plus connected component summaries.

Filtering & Aggregation

Typed filter pushdown and JSON traversal specs for server-side aggregates.

Sync & Maintenance

Trigger-buffered capture, explicit apply, background jobs, and full rebuild vacuum.

Fast Persistence

Atomic .pggraph artifacts with mmap-backed arrays for fast backend startup.

Roadmap & Future Work

Semantic-Guided Search

Vector-distance ranking with pgVector to prioritize relevant paths during traversal.

WAL Streaming Sync

Logical replication based sync to avoid trigger-buffer volume.

COPY Build Scanner

A server-side COPY scanner to optimize large-scale graph builds.

Online Mutable CSR

Append or delta structures to reduce rebuild pressure without weakening correctness.

Full A* Path Search

Specialized semantic-guided search mode using admissible heuristics.

Global Analytics

Companion analytics paths for PageRank, Louvain, and betweenness.

Distributed Execution

Scaling graph execution across distributed PostgreSQL instances.

Documentation

License

Apache-2.0. See LICENSE.

Last updated on

Quickstart