VEC 2.0
A dead simple vector database that lives on your GPU. Now with a sidecar payload field — every record is (vector, label, ≤100KB blob). The MySQL sidekick is gone; one exe replaces both.
Store millions of embeddings in VRAM. Find the nearest ones in milliseconds. Single exe, no dependencies, no configuration.
# Start a database vec mydb 1024 # That's it. It's listening on port 1920.
No GPU? No problem. vec-cpu does the same thing using RAM.
Upgrading from 1.x? See
MIGRATION.md. Server protocol is a clean break — old SDKs won't work..tensorsand.metafiles load unchanged;.datais a new sidecar that appears on first save once you store payloads.
Protocol
Binary only. Every request and every response is a length-prefixed binary frame over TCP, named pipe, or Unix socket.
request: F0 <2B ns_len> [ns] <CMD> <2B label_len> [label] <4B body_len> [body]
response: <1B status> <4B body_len> [body] ; status 0=ok, 1=err
16 commands, one envelope. See PROTOCOL-2.0.md for the byte-exact spec or sdk/README.md for the quick reference.
Use the SDK libraries — there is no text interface.
What it does
- PUSH — store a vector, optionally with a label and a ≤100KB data payload (data requires label). Returns the slot index. Rejects bit-identical duplicates.
- QUERY — nearest-neighbor search; metric byte selects L2 or cosine.
- QID — like QUERY but the query is an existing stored vector (by index or label).
- EXISTS — exact-match lookup by vector content. Returns the slot index (and optional label/data) if present,
found=0otherwise. - GET — retrieve records by index, by label (may yield multiple), or batch by index list.
- SET_DATA / GET_DATA — manage the per-slot payload independently of the vector.
- UPDATE — overwrite a vector in place (label and data untouched).
- LABEL — set or clear a slot's label.
- DELETE — tombstone a slot (also frees its label and data).
- UNDO — remove the last PUSH (also frees its label and data).
- CLUSTER — DBSCAN over the full set.
- DISTINCT — farthest-point sampling (k most spread-out vectors).
- REPRESENT — one most-distinct member per cluster.
- INFO — database metadata (dim, count, format, CRC, name, protocol version).
- SAVE — flush to disk; returns saved count + CRC.
Result shape
QUERY/QID/GET take a 1-byte shape mask that controls what each result record carries:
0x01vector0x02label0x04data0x07all three (default)
Skip what you don't need to keep responses lean.
L2 vs Cosine
- L2 (squared Euclidean) — "how far apart?" Use for vision models (DINOv2, ArcFace).
- Cosine — "looking the same direction?" Use for text models (BGE, MiniLM, CLIP).
QUERY and QID take a metric byte (0=L2 default, 1=cosine). CLUSTER/DISTINCT/REPRESENT take the same byte.
Multiple databases
Run them behind a router:
# Start databases without TCP (pipe/socket only) vec --notcp tools 1024 vec --notcp conversations 1024 # Route them all through one port vec --route 1920
Or let deploy mode do it all:
# Auto-discover all .tensors files vec deploy # Custom port vec deploy 1920 # Explicit schema vec --deploy=tools:1024,conversations:1024,faces:512:f16 1920
The SDK sets a namespace on the client. The router strips it from the frame and forwards to the correct backend.
Housekeeping
# Delete an entire database vec face --delete # Check file integrity (dry run) vec mydb --check # Repair a corrupt database vec mydb --repair # Auto-detect .tensors in current directory vec # Load a specific file vec mydb.tensors # fp16 mode — half the VRAM, double the capacity (GPU only) vec mydb 1024:f16
What it looks like
NVIDIA GeForce RTX 3060 (12.0 GB)
===================================================================
database mydb
format f32, 1024 dim
records 4.2m total, 4.1m active, 100.0k deleted
file size 16.1 GB
modified 12 minutes ago (14:20)
capacity 2.9m max, 1.7m remaining (58.6%)
checksum NOMITOPO (0xA3F291B7) ok
===================================================================
Performance
1024-dim vectors:
GPU (RTX 3060) CPU
10K ~0.2 ms ~2 ms
100K ~1.5 ms ~20 ms
1M ~14 ms ~200 ms
Capacity
fp32, 1024 dimensions:
8 GB VRAM 1.9M vectors
12 GB VRAM 2.9M vectors
24 GB VRAM 5.8M vectors
Lower dimensions or fp16 = more capacity.
Supported GPUs
RTX 2000 series and newer (Turing, Ampere, Ada Lovelace). RTX 2060 through 4090, plus T4, A100, L40.
GTX 1000 series and older won't work. AMD/Intel GPUs won't work. Use vec-cpu instead.
Build
# Windows build.bat # Linux ./build.sh
Requires NVIDIA CUDA Toolkit 12.x for vec. vec-cpu just needs a C++ compiler.
Good to know
- Brute force. Every query scans every vector. Exact results, zero approximation.
- GPU top-K. Above 100K vectors, a CUDA kernel finds the top results on GPU.
- All RAM. Vectors in VRAM (or RAM for vec-cpu), labels and data alongside them. No mmap, no lazy paging — disk is touched only on startup load and explicit SAVE.
- Append-time dedup. Every PUSH is hashed (xxh64 over stored bytes) and byte-compared against existing alive slots; bit-identical vectors are rejected with the existing slot's index. EXISTS exposes the same lookup as a query op.
- Indices are permanent. Slot 42 is always slot 42. Deletes are tombstones.
- Labels are clean. ≤2048 bytes, no spaces, no
: * ? " < > | ,. URI-style paths likedocs/file.pdfare fine. - Data is opaque. ≤100KB per slot. VEC stores the bytes verbatim — sniff the mime on the client if you need to.
- Same file format across builds. vec and vec-cpu read/write the same
.tensors,.meta,.datafiles. - CRC32 on save. Pronounceable checksum word for eyeball integrity checks.
- Read-only mode. If the file isn't writable, queries work, writes are rejected.
- Disk space check. Saves skipped if insufficient space.
- File repair.
--checkverifies,--repairfixes.
File format
.tensors [4B dim][4B count][4B deleted][1B fmt][count×1B alive][vectors][4B CRC32]
.meta [4B count][per slot: 4B len + label bytes]
.hashes [4B count][count × 8B u64 xxh64 hash][4B CRC32]
.data [4B count][count×1B alive mask][per present slot: 4B len + bytes][4B CRC32]
.data and .hashes are new in 2.0. .data is only written when at least one slot has a payload. .hashes is the persisted dedup index — if missing or its CRC fails, it's rebuilt from .tensors at startup.
Client SDKs
C++ (vec_client.h), Python (vec_client.py), Node.js (vec_client.js), Delphi (vec_client.pas).
All in sdk/. Quick protocol reference in sdk/README.md. Byte-exact spec in PROTOCOL-2.0.md.
Tested with
DINOv2 (1024d), BGE-large (1024d), ArcFace (512d), MiniLM-L12 (384d).
Curated by @PsyChip - April 2026