Contents:
- 1. Overview
- 2. Building The Extension
- 3. Usage
- 4. Roadmap
1. Overview
Vec1 is an SQLite extension that provides approximate nearest-neighbor (ANN) vector search using SQLite's virtual table interface. Euclidean (L2) and cosine distances are supported. Vec1 is implemented in portable C and has no external dependencies. It uses AVX2 on x86 and NEON on ARM.
Vec1 uses IVFADC (Inverted File with Asymmetric Distance Computation) with OPQ (Optimized Product Quantization).
Tests on publicly available datasets are available here.
2. Building the Extension
The extension is implemented in a single C file, "vec1.c". It may be compiled in the same way as other SQLite extensions. For best performance, compile with SIMD support and aggressive compiler optimizations.
For example, on Linux or macOS x86-64 with gcc or clang:
cc -g -O3 -DNDEBUG -mavx2 -mfma vec1.c -shared -fPIC -o vec1.so
Or with MSVC on x86-64:
cl /Zi /O2 /DNDEBUG /arch:AVX2 vec1.c -link -dll -out:vec1.dll
No special switches are required to enable NEON on ARM. The compiler should still be passed -O3 or the equivalent to enable loop-unrolling and other aggressive optimizations.
Binaries compiled with SIMD instructions enabled on x86-64 platforms as shown above will not work on systems that lack them. A method for building vec1 to support multiple x86-64 architectures is found in this Makefile (target "vec1multi.so").
3. Usage
A user manual and reference docs are available. Also a video "Vector Queries with SQLite Vec1".
4. Roadmap
No further features are required before first release. But:
- Testing is insufficient.
Other things to be added and/or investigated following first release:
Almost all paths require optimization.
Optimization of "SELECT count(*) FROM vec1tbl".
Support for some sort of bit-encoding. RaBitQ?
Add support for SIMD on wasm.
Support vectors constructed of elements other than 32-bit IEEE floats - e.g. 8-bit or 32-bit integers, or 16-bit floats.
Support for partition keys.
Add an option for a modern graph-based index as an alternative to IVFADC. HNSW? DiskANN? Some variant? Or, better, something that preserves the advantages of IVFADC without the training requirement. If RaBitQ or TurboQuant or something can be used to quantize vectors without training, then perhaps there is also a way to do coarse quant without training as well.
Add the ability to open and use databases created on platforms that use different byte orders for floating point values.
Implement dot-product as a distance metric for search.