GitHub - PsyChip/VEC: GPU-resident vector database in single exe

VEC 2.0

A dead simple vector database that lives on your GPU. Now with a sidecar payload field — every record is (vector, label, ≤100KB blob). The MySQL sidekick is gone; one exe replaces both.

Store millions of embeddings in VRAM. Find the nearest ones in milliseconds. Single exe, no dependencies, no configuration.

# Start a database
vec mydb 1024

# That's it. It's listening on port 1920.

No GPU? No problem. vec-cpu does the same thing using RAM.

Upgrading from 1.x? See MIGRATION.md. Server protocol is a clean break — old SDKs won't work. .tensors and .meta files load unchanged; .data is a new sidecar that appears on first save once you store payloads.

Protocol

Binary only. Every request and every response is a length-prefixed binary frame over TCP, named pipe, or Unix socket.

request:  F0 <2B ns_len> [ns] <CMD> <2B label_len> [label] <4B body_len> [body]
response: <1B status> <4B body_len> [body]        ; status 0=ok, 1=err

16 commands, one envelope. See PROTOCOL-2.0.md for the byte-exact spec or sdk/README.md for the quick reference.

Use the SDK libraries — there is no text interface.

What it does

PUSH — store a vector, optionally with a label and a ≤100KB data payload (data requires label). Returns the slot index. Rejects bit-identical duplicates.
QUERY — nearest-neighbor search; metric byte selects L2 or cosine.
QID — like QUERY but the query is an existing stored vector (by index or label).
EXISTS — exact-match lookup by vector content. Returns the slot index (and optional label/data) if present, found=0 otherwise.
GET — retrieve records by index, by label (may yield multiple), or batch by index list.
SET_DATA / GET_DATA — manage the per-slot payload independently of the vector.
UPDATE — overwrite a vector in place (label and data untouched).
LABEL — set or clear a slot's label.
DELETE — tombstone a slot (also frees its label and data).
UNDO — remove the last PUSH (also frees its label and data).
CLUSTER — DBSCAN over the full set.
DISTINCT — farthest-point sampling (k most spread-out vectors).
REPRESENT — one most-distinct member per cluster.
INFO — database metadata (dim, count, format, CRC, name, protocol version).
SAVE — flush to disk; returns saved count + CRC.

Result shape

QUERY/QID/GET take a 1-byte shape mask that controls what each result record carries:

0x01 vector
0x02 label
0x04 data
0x07 all three (default)

Skip what you don't need to keep responses lean.

L2 vs Cosine

L2 (squared Euclidean) — "how far apart?" Use for vision models (DINOv2, ArcFace).
Cosine — "looking the same direction?" Use for text models (BGE, MiniLM, CLIP).

QUERY and QID take a metric byte (0=L2 default, 1=cosine). CLUSTER/DISTINCT/REPRESENT take the same byte.

Multiple databases

Run them behind a router:

# Start databases without TCP (pipe/socket only)
vec --notcp tools 1024
vec --notcp conversations 1024

# Route them all through one port
vec --route 1920

Or let deploy mode do it all:

# Auto-discover all .tensors files
vec deploy

# Custom port
vec deploy 1920

# Explicit schema
vec --deploy=tools:1024,conversations:1024,faces:512:f16 1920

The SDK sets a namespace on the client. The router strips it from the frame and forwards to the correct backend.

Housekeeping

# Delete an entire database
vec face --delete

# Check file integrity (dry run)
vec mydb --check

# Repair a corrupt database
vec mydb --repair

# Auto-detect .tensors in current directory
vec

# Load a specific file
vec mydb.tensors

# fp16 mode — half the VRAM, double the capacity (GPU only)
vec mydb 1024:f16

What it looks like

NVIDIA GeForce RTX 3060 (12.0 GB)
===================================================================
  database      mydb
  format        f32, 1024 dim
  records       4.2m total, 4.1m active, 100.0k deleted
  file size     16.1 GB
  modified      12 minutes ago (14:20)
  capacity      2.9m max, 1.7m remaining (58.6%)
  checksum      NOMITOPO (0xA3F291B7) ok
===================================================================

Performance

1024-dim vectors:

             GPU (RTX 3060)    CPU
  10K        ~0.2 ms           ~2 ms
  100K       ~1.5 ms           ~20 ms
  1M         ~14 ms            ~200 ms

Capacity

fp32, 1024 dimensions:

  8 GB VRAM     1.9M vectors
  12 GB VRAM    2.9M vectors
  24 GB VRAM    5.8M vectors

Lower dimensions or fp16 = more capacity.

Supported GPUs

RTX 2000 series and newer (Turing, Ampere, Ada Lovelace). RTX 2060 through 4090, plus T4, A100, L40.

GTX 1000 series and older won't work. AMD/Intel GPUs won't work. Use vec-cpu instead.

Build

# Windows
build.bat

# Linux
./build.sh

Requires NVIDIA CUDA Toolkit 12.x for vec. vec-cpu just needs a C++ compiler.

Good to know

Brute force. Every query scans every vector. Exact results, zero approximation.
GPU top-K. Above 100K vectors, a CUDA kernel finds the top results on GPU.
All RAM. Vectors in VRAM (or RAM for vec-cpu), labels and data alongside them. No mmap, no lazy paging — disk is touched only on startup load and explicit SAVE.
Append-time dedup. Every PUSH is hashed (xxh64 over stored bytes) and byte-compared against existing alive slots; bit-identical vectors are rejected with the existing slot's index. EXISTS exposes the same lookup as a query op.
Indices are permanent. Slot 42 is always slot 42. Deletes are tombstones.
Labels are clean. ≤2048 bytes, no spaces, no : * ? " < > | ,. URI-style paths like docs/file.pdf are fine.
Data is opaque. ≤100KB per slot. VEC stores the bytes verbatim — sniff the mime on the client if you need to.
Same file format across builds. vec and vec-cpu read/write the same .tensors, .meta, .data files.
CRC32 on save. Pronounceable checksum word for eyeball integrity checks.
Read-only mode. If the file isn't writable, queries work, writes are rejected.
Disk space check. Saves skipped if insufficient space.
File repair. --check verifies, --repair fixes.

File format

.tensors  [4B dim][4B count][4B deleted][1B fmt][count×1B alive][vectors][4B CRC32]
.meta     [4B count][per slot: 4B len + label bytes]
.hashes   [4B count][count × 8B u64 xxh64 hash][4B CRC32]
.data     [4B count][count×1B alive mask][per present slot: 4B len + bytes][4B CRC32]

.data and .hashes are new in 2.0. .data is only written when at least one slot has a payload. .hashes is the persisted dedup index — if missing or its CRC fails, it's rebuilt from .tensors at startup.

Client SDKs

C++ (vec_client.h), Python (vec_client.py), Node.js (vec_client.js), Delphi (vec_client.pas).

All in sdk/. Quick protocol reference in sdk/README.md. Byte-exact spec in PROTOCOL-2.0.md.

Tested with

DINOv2 (1024d), BGE-large (1024d), ArcFace (512d), MiniLM-L12 (384d).

Curated by @PsyChip - April 2026