pg_infer

7 min read Original article ↗

A PostgreSQL extension makes LLM model-knowledge available as an index for similarity searches, inference, and more.

pg_infer
========

Status: EXPERIMENTAL (v0.1.0-alpha)

A PostgreSQL extension that exposes transformer model knowledge as
SQL-queryable relations.  Built on pgrx 0.17 for PostgreSQL 18+.

This project is in early experimental stage.  It is functional, tested
(769+ tests passing), and performant on supported hardware, but:

  - The SQL API may change without notice between releases
  - No production deployments exist yet
  - Some compute paths require specific hardware (Apple Metal, NVIDIA CUDA)
  - The vindex format is not yet frozen

Use pg_infer if you want to explore what's possible at the intersection
of transformer model internals and relational databases.  Do not use it
for production workloads without accepting the risk of breaking changes.

pg_infer lets you ask questions about what a language model "knows" --
its internal feature activations, learned associations, and semantic
relationships -- directly from SQL, without running inference.


QUICK START
-----------

    -- Load a vindex (extracted model knowledge)
    SELECT infer_create_model('qwen05b', '/data/qwen-0.5b.vindex');
    --  infer_create_model
    -- --------------------
    --  qwen05b
    -- (1 row)

    -- What does the model know about France?
    SELECT * FROM describe('France');
    --  relation  |   target   | confidence | layer
    -- -----------+------------+------------+-------
    --  capital   | Paris      |       42.7 |    18
    --  language  | French     |       38.1 |    17
    --  continent | Europe     |       35.4 |    16
    --  currency  | euro       |       29.8 |    19
    --  leader    | president  |       24.3 |    20
    -- (5 rows)

    -- How similar are two concepts?
    SELECT similar_to('France', 'Paris');
    --  similar_to
    -- ------------
    --       42.71
    -- (1 row)

    -- Trace raw feature activations
    SELECT * FROM walk('The capital of France is', top => 10);
    --  layer | feature | activation |   concept
    -- -------+---------+------------+-------------
    --      0 |    1247 |      18.34 | capital
    --      0 |     892 |      16.21 | France
    --      5 |    3401 |      31.47 | European
    --     12 |     467 |      44.82 | Paris
    --     18 |    2103 |      52.19 | geography
    -- ...
    -- (10 rows)

    -- Does the model's knowledge support this relationship?
    SELECT implies('France', 'Paris');
    --  implies
    -- ---------
    --  t
    -- (1 row)

    -- ORDER BY semantic similarity (requires column index)
    CREATE INDEX ON documents USING infer (title) WITH (model = 'qwen05b');
    SELECT * FROM documents ORDER BY title <~> 'artificial intelligence' LIMIT 5;
    --  id |              title               | distance
    -- ----+----------------------------------+----------
    --   7 | Machine Learning Fundamentals    |    0.023
    --  12 | Deep Neural Networks             |    0.031
    --   3 | Natural Language Processing      |    0.044
    --  19 | Computer Vision and AI           |    0.058
    --   1 | Introduction to Data Science     |    0.089
    -- (5 rows)


COMBINED SEARCH WITH OTHER EXTENSIONS
--------------------------------------

pg_infer's model-knowledge similarity can be combined with PostgreSQL's
other search extensions to build search systems that understand meaning,
not just text patterns.

    -- Setup: enable all extensions
    CREATE EXTENSION IF NOT EXISTS pg_infer;
    CREATE EXTENSION IF NOT EXISTS vector;       -- pgvector
    CREATE EXTENSION IF NOT EXISTS pg_trgm;      -- trigram similarity

    -- A papers table with text, embeddings, and infer index
    CREATE TABLE papers (
        id     serial PRIMARY KEY,
        title  text NOT NULL,
        body   text NOT NULL,
        embed  vector(384)     -- sentence-transformer embedding
    );

    -- Create indexes for each search signal
    CREATE INDEX ON papers USING infer (title) WITH (model = 'qwen05b');
    CREATE INDEX ON papers USING gin (body gin_trgm_ops);
    CREATE INDEX ON papers USING ivfflat (embed vector_cosine_ops);
    CREATE INDEX ON papers USING gin (to_tsvector('english', body));

    -- Multi-signal ranked search: combine model knowledge, vector
    -- similarity, full-text relevance, and fuzzy matching
    --
    -- This query is impossible without pg_infer: it finds papers
    -- where the model "knows" the title is related to the query,
    -- even when no keywords overlap.  "neural architecture search"
    -- matches "AutoML for Deep Networks" because the model learned
    -- that relationship -- no keyword, embedding, or trigram method
    -- can discover this from text alone.
    WITH candidates AS (
        SELECT id, title, body,
               title <~> 'neural architecture search'  AS infer_dist,
               embed <=> '[0.1, 0.2, ...]'::vector     AS vec_dist,
               similarity(title, 'neural architecture') AS trgm_score,
               ts_rank(to_tsvector('english', body),
                       plainto_tsquery('neural architecture')) AS ts_rank
        FROM papers
        WHERE to_tsvector('english', body) @@ plainto_tsquery('neural architecture')
           OR title % 'neural architecture'
        LIMIT 100
    )
    SELECT id, title,
           (0.4 * (1 - infer_dist) +
            0.3 * (1 - vec_dist) +
            0.2 * trgm_score +
            0.1 * ts_rank) AS combined_score
    FROM candidates
    ORDER BY combined_score DESC
    LIMIT 10;
    --  id |                title                 | combined_score
    -- ----+--------------------------------------+----------------
    --  42 | AutoML for Deep Networks             |          0.847
    --  17 | Efficient Neural Architecture Search |          0.831
    --  91 | Hyperparameter Optimization Survey   |          0.724
    --   8 | Meta-Learning and Model Selection    |          0.698
    --  55 | Transformer Architecture Innovations |          0.672
    -- (5 rows)


SQL FUNCTIONS
-------------

walk(prompt, top, model)
    Trace gate activations across all layers for a prompt.
    Returns: layer, feature, activation, concept

describe(entity, model, threshold)
    Return knowledge edges the model has about an entity.
    Returns: relation, target, confidence, layer

similar_to(a, b, model)
    Semantic similarity score between two texts.
    Returns: float8 (higher = more similar)

implies(subject, object, model)
    Test whether the model's knowledge supports a directional
    relationship from subject to object.
    Returns: boolean

infer(prompt, top, model)
    Full forward-pass prediction (requires --features inference).
    Returns: token, probability, rank

infer_distance(a, b)
    Distance function for the <~> operator (lower = more similar).
    Returns: float8

infer_create_model(name, source, extract_level)
    Register a model from a vindex directory or HuggingFace URI.

infer_drop_model(name)
    Remove a model registration.

infer_models()
    List all registered models.


GUC PARAMETERS
--------------

infer.default_model        Default model for query functions (text)
infer.data_directory       Base directory for vindex files (text, default: 'infer')
infer.max_memory           Max memory per backend in MB (int, default: 8192)
infer.auto_download        Allow HuggingFace downloads (bool, default: on)
infer.gate_threshold       Gate score threshold (float, 0 = adaptive)
infer.describe_top_k       Features per layer for describe (int, default: 20)
infer.walk_embed_mode      Embedding mode for walk: 'average' or 'last'


INDEX ACCESS METHOD
-------------------

pg_infer provides a custom index AM ("infer") with two modes:

Model index:  Stores the full vindex data in WAL-logged PG pages.
    CREATE INDEX name ON infer._models USING infer (name)
        WITH (source = '/path/to/model.vindex');

Column index: Enables ORDER BY <~> queries on text columns.
    CREATE INDEX ON table USING infer (column)
        WITH (model = 'model_name');


BUILDING
--------

Requires: Rust nightly, pgrx 0.17, PostgreSQL 18+, OpenSSL, openblas.

    cd pg_infer
    cargo pgrx run pg18        # start PG with extension loaded
    cargo pgrx test pg18       # run integration tests

See CLAUDE.md for exact build environment variables needed on
systems with Nix compiler wrappers.


STORAGE
-------

Model data is stored in standard 8KB PostgreSQL pages using
GenericXLog for WAL logging.  Page types:

    Meta       - model metadata (name, dimensions, configuration)
    LayerDir   - per-layer block ranges for gate vectors
    Gate       - f16 gate activation vectors
    Embed      - token embedding vectors
    DownMeta   - feature metadata (labels, top tokens)
    Blob       - tokenizer data

The OS kernel shares mmap pages across backends that access the
same model, so memory overhead per-connection is minimal.


DEPLOYMENT BACKENDS
-------------------

pg_infer supports two per-model storage/execution backends:

    local  - mmap the vindex directly in each PostgreSQL backend.
             Simple but does not share the f16 decode cache across
             connections.

    remote - talk to a colocated or networked `larql-server` process
             over HTTP/2 or a Unix domain socket.  One copy of the
             model per host, shared activation cache across every PG
             backend, and optional layer-sharding via `larql-router`.

Register a remote model with:

    SELECT infer_create_model_remote('m', 'uds:///run/larql.sock');
    -- or:
    SELECT infer_create_model_remote('m', 'http://server:8080');

In-flight remote calls respond to pg_cancel_backend(...) within ~100ms
via a polling bridge to PostgreSQL's InterruptPending flag.

See docs/REMOTE_BACKEND.md for the full walkthrough, GUCs, pgbench
scripts, and expected throughput numbers.


DEPENDENCIES
------------

pg_infer includes its computational crates as workspace members
under crates/:

    infer-vindex    VectorIndex loading, gate KNN, tokenizer
    infer-compute   Linear algebra (BLAS-backed)
    infer-models    Model configuration and metadata
    infer-core      Core graph engine
    infer-inference (optional) Full forward-pass inference


INSPIRED BY
-----------

pg_infer builds on work from the LARQL project by Chris Hay
(github.com/chrishayuk/chuk-larql-rs).  LARQL pioneered the idea of
making transformer model internals queryable -- extracting gate vectors,
feature activations, and learned associations into a format that can
be explored interactively.  The vindex format, gate KNN algorithm,
and feature labeling pipeline all originate from LARQL.  Thank you
Chris for the foundational work that made this possible.


LICENSE
-------

Apache-2.0