A PostgreSQL extension makes LLM model-knowledge available as an index for similarity searches, inference, and more.
pg_infer
========
Status: EXPERIMENTAL (v0.1.0-alpha)
A PostgreSQL extension that exposes transformer model knowledge as
SQL-queryable relations. Built on pgrx 0.17 for PostgreSQL 18+.
This project is in early experimental stage. It is functional, tested
(769+ tests passing), and performant on supported hardware, but:
- The SQL API may change without notice between releases
- No production deployments exist yet
- Some compute paths require specific hardware (Apple Metal, NVIDIA CUDA)
- The vindex format is not yet frozen
Use pg_infer if you want to explore what's possible at the intersection
of transformer model internals and relational databases. Do not use it
for production workloads without accepting the risk of breaking changes.
pg_infer lets you ask questions about what a language model "knows" --
its internal feature activations, learned associations, and semantic
relationships -- directly from SQL, without running inference.
QUICK START
-----------
-- Load a vindex (extracted model knowledge)
SELECT infer_create_model('qwen05b', '/data/qwen-0.5b.vindex');
-- infer_create_model
-- --------------------
-- qwen05b
-- (1 row)
-- What does the model know about France?
SELECT * FROM describe('France');
-- relation | target | confidence | layer
-- -----------+------------+------------+-------
-- capital | Paris | 42.7 | 18
-- language | French | 38.1 | 17
-- continent | Europe | 35.4 | 16
-- currency | euro | 29.8 | 19
-- leader | president | 24.3 | 20
-- (5 rows)
-- How similar are two concepts?
SELECT similar_to('France', 'Paris');
-- similar_to
-- ------------
-- 42.71
-- (1 row)
-- Trace raw feature activations
SELECT * FROM walk('The capital of France is', top => 10);
-- layer | feature | activation | concept
-- -------+---------+------------+-------------
-- 0 | 1247 | 18.34 | capital
-- 0 | 892 | 16.21 | France
-- 5 | 3401 | 31.47 | European
-- 12 | 467 | 44.82 | Paris
-- 18 | 2103 | 52.19 | geography
-- ...
-- (10 rows)
-- Does the model's knowledge support this relationship?
SELECT implies('France', 'Paris');
-- implies
-- ---------
-- t
-- (1 row)
-- ORDER BY semantic similarity (requires column index)
CREATE INDEX ON documents USING infer (title) WITH (model = 'qwen05b');
SELECT * FROM documents ORDER BY title <~> 'artificial intelligence' LIMIT 5;
-- id | title | distance
-- ----+----------------------------------+----------
-- 7 | Machine Learning Fundamentals | 0.023
-- 12 | Deep Neural Networks | 0.031
-- 3 | Natural Language Processing | 0.044
-- 19 | Computer Vision and AI | 0.058
-- 1 | Introduction to Data Science | 0.089
-- (5 rows)
COMBINED SEARCH WITH OTHER EXTENSIONS
--------------------------------------
pg_infer's model-knowledge similarity can be combined with PostgreSQL's
other search extensions to build search systems that understand meaning,
not just text patterns.
-- Setup: enable all extensions
CREATE EXTENSION IF NOT EXISTS pg_infer;
CREATE EXTENSION IF NOT EXISTS vector; -- pgvector
CREATE EXTENSION IF NOT EXISTS pg_trgm; -- trigram similarity
-- A papers table with text, embeddings, and infer index
CREATE TABLE papers (
id serial PRIMARY KEY,
title text NOT NULL,
body text NOT NULL,
embed vector(384) -- sentence-transformer embedding
);
-- Create indexes for each search signal
CREATE INDEX ON papers USING infer (title) WITH (model = 'qwen05b');
CREATE INDEX ON papers USING gin (body gin_trgm_ops);
CREATE INDEX ON papers USING ivfflat (embed vector_cosine_ops);
CREATE INDEX ON papers USING gin (to_tsvector('english', body));
-- Multi-signal ranked search: combine model knowledge, vector
-- similarity, full-text relevance, and fuzzy matching
--
-- This query is impossible without pg_infer: it finds papers
-- where the model "knows" the title is related to the query,
-- even when no keywords overlap. "neural architecture search"
-- matches "AutoML for Deep Networks" because the model learned
-- that relationship -- no keyword, embedding, or trigram method
-- can discover this from text alone.
WITH candidates AS (
SELECT id, title, body,
title <~> 'neural architecture search' AS infer_dist,
embed <=> '[0.1, 0.2, ...]'::vector AS vec_dist,
similarity(title, 'neural architecture') AS trgm_score,
ts_rank(to_tsvector('english', body),
plainto_tsquery('neural architecture')) AS ts_rank
FROM papers
WHERE to_tsvector('english', body) @@ plainto_tsquery('neural architecture')
OR title % 'neural architecture'
LIMIT 100
)
SELECT id, title,
(0.4 * (1 - infer_dist) +
0.3 * (1 - vec_dist) +
0.2 * trgm_score +
0.1 * ts_rank) AS combined_score
FROM candidates
ORDER BY combined_score DESC
LIMIT 10;
-- id | title | combined_score
-- ----+--------------------------------------+----------------
-- 42 | AutoML for Deep Networks | 0.847
-- 17 | Efficient Neural Architecture Search | 0.831
-- 91 | Hyperparameter Optimization Survey | 0.724
-- 8 | Meta-Learning and Model Selection | 0.698
-- 55 | Transformer Architecture Innovations | 0.672
-- (5 rows)
SQL FUNCTIONS
-------------
walk(prompt, top, model)
Trace gate activations across all layers for a prompt.
Returns: layer, feature, activation, concept
describe(entity, model, threshold)
Return knowledge edges the model has about an entity.
Returns: relation, target, confidence, layer
similar_to(a, b, model)
Semantic similarity score between two texts.
Returns: float8 (higher = more similar)
implies(subject, object, model)
Test whether the model's knowledge supports a directional
relationship from subject to object.
Returns: boolean
infer(prompt, top, model)
Full forward-pass prediction (requires --features inference).
Returns: token, probability, rank
infer_distance(a, b)
Distance function for the <~> operator (lower = more similar).
Returns: float8
infer_create_model(name, source, extract_level)
Register a model from a vindex directory or HuggingFace URI.
infer_drop_model(name)
Remove a model registration.
infer_models()
List all registered models.
GUC PARAMETERS
--------------
infer.default_model Default model for query functions (text)
infer.data_directory Base directory for vindex files (text, default: 'infer')
infer.max_memory Max memory per backend in MB (int, default: 8192)
infer.auto_download Allow HuggingFace downloads (bool, default: on)
infer.gate_threshold Gate score threshold (float, 0 = adaptive)
infer.describe_top_k Features per layer for describe (int, default: 20)
infer.walk_embed_mode Embedding mode for walk: 'average' or 'last'
INDEX ACCESS METHOD
-------------------
pg_infer provides a custom index AM ("infer") with two modes:
Model index: Stores the full vindex data in WAL-logged PG pages.
CREATE INDEX name ON infer._models USING infer (name)
WITH (source = '/path/to/model.vindex');
Column index: Enables ORDER BY <~> queries on text columns.
CREATE INDEX ON table USING infer (column)
WITH (model = 'model_name');
BUILDING
--------
Requires: Rust nightly, pgrx 0.17, PostgreSQL 18+, OpenSSL, openblas.
cd pg_infer
cargo pgrx run pg18 # start PG with extension loaded
cargo pgrx test pg18 # run integration tests
See CLAUDE.md for exact build environment variables needed on
systems with Nix compiler wrappers.
STORAGE
-------
Model data is stored in standard 8KB PostgreSQL pages using
GenericXLog for WAL logging. Page types:
Meta - model metadata (name, dimensions, configuration)
LayerDir - per-layer block ranges for gate vectors
Gate - f16 gate activation vectors
Embed - token embedding vectors
DownMeta - feature metadata (labels, top tokens)
Blob - tokenizer data
The OS kernel shares mmap pages across backends that access the
same model, so memory overhead per-connection is minimal.
DEPLOYMENT BACKENDS
-------------------
pg_infer supports two per-model storage/execution backends:
local - mmap the vindex directly in each PostgreSQL backend.
Simple but does not share the f16 decode cache across
connections.
remote - talk to a colocated or networked `larql-server` process
over HTTP/2 or a Unix domain socket. One copy of the
model per host, shared activation cache across every PG
backend, and optional layer-sharding via `larql-router`.
Register a remote model with:
SELECT infer_create_model_remote('m', 'uds:///run/larql.sock');
-- or:
SELECT infer_create_model_remote('m', 'http://server:8080');
In-flight remote calls respond to pg_cancel_backend(...) within ~100ms
via a polling bridge to PostgreSQL's InterruptPending flag.
See docs/REMOTE_BACKEND.md for the full walkthrough, GUCs, pgbench
scripts, and expected throughput numbers.
DEPENDENCIES
------------
pg_infer includes its computational crates as workspace members
under crates/:
infer-vindex VectorIndex loading, gate KNN, tokenizer
infer-compute Linear algebra (BLAS-backed)
infer-models Model configuration and metadata
infer-core Core graph engine
infer-inference (optional) Full forward-pass inference
INSPIRED BY
-----------
pg_infer builds on work from the LARQL project by Chris Hay
(github.com/chrishayuk/chuk-larql-rs). LARQL pioneered the idea of
making transformer model internals queryable -- extracting gate vectors,
feature activations, and learned associations into a format that can
be explored interactively. The vindex format, gate KNN algorithm,
and feature labeling pipeline all originate from LARQL. Thank you
Chris for the foundational work that made this possible.
LICENSE
-------
Apache-2.0