Vector Overview
Official MariaDB Vector reference: VECTOR(n) data type, VECTOR INDEX (M, DISTANCE=euclidean|cosine), VEC_FromText() inserts, VEC_DISTANCE() queries.
The Next Generation of MariaDB: Powered by Vector Search
MariaDB Vector is a feature that allows MariaDB Server to perform as a relational vector database. Vectors generated by an AI model can be stored and searched in MariaDB.
The initial implementation uses the modified HNSW algorithm for searching in the vector index (to solve the so-called Approximate Nearest Neighbor problem), and defaults to Euclidean distance. Concurrent reads/writes and all transaction isolation levels are supported.
MariaDB uses int16 for indexes, which gives 15 bits to store the value, rather than 10 bits for float16.
Vectors can be defined using VECTOR INDEX for the index definition, and using the VECTOR data type in the CREATE TABLE statement.
CREATE TABLE v (
id INT PRIMARY KEY,
v VECTOR(5) NOT NULL,
VECTOR INDEX (v)
);The distance function used to build the vector index can be euclidean (the default) or cosine. An additional option, M, can be used to configure the vector index. Larger values mean slower SELECT and INSERT statements, larger index size and higher memory consumption but more accurate results. The valid range is from 3 to 200.
CREATE TABLE embeddings (
doc_id BIGINT UNSIGNED PRIMARY KEY,
embedding VECTOR(1536) NOT NULL,
VECTOR INDEX (embedding) M=8 DISTANCE=cosine
);Vector columns store 32-bit IEEE 754 floating point numbers.
Alternatively, you can use VEC_FromText() function:
For vector indexes built with the euclidean function, VEC_DISTANCE_EUCLIDEAN can be used. It calculates a Euclidean (L2) distance between two points:
Most commonly, this kind of query is done with a limit, for example to return vectors that are closest to a given vector, such as from a user search query, image or a song fragment:
For vector indexes built with the cosine function, VEC_DISTANCE_COSINE can be used. It calculates a Cosine distance between two vectors:
The VEC_DISTANCE function is a generic function that behaves either as VEC_DISTANCE_EUCLIDEAN or VEC_DISTANCE_COSINE, depending on the underlying index type:
There is no function for dot product (also called inner product) distance available in many other vector databases. Dot product is not a proper distance measure (for example, vector's closest match is not necessarily itself) and is only used for performance reasons, because it is often faster than cosine or euclidean and produces the same results if vectors are normalized. In MariaDB optimized implementation euclidean and cosine measures are the fastest, and dot product, if implemented, would not provide any performance benefits. Use euclidean or cosine (they are equally fast) for normalized vectors.
Using the Vector Index Efficiently
The optimizer uses the vector index only when the ORDER BY is the literal VEC_DISTANCE_*(column, vector) call (or its alias) sorted ascending, together with a LIMIT. Two common patterns defeat the index and fall back to a full table scan.
Wrapping the distance in an expression. To sort by a similarity score (for example 1 - cosine distance), compute the score in an outer query and keep the inner ORDER BY on the bare distance:
Filtering by a distance threshold. A bare WHERE VEC_DISTANCE(...) < threshold with no ORDER BY ... LIMIT is a range predicate that the index cannot drive, so it falls back to a full table scan. To use the index, retrieve an indexed top-K and apply the threshold to it, for example in a subquery:
The LIMIT is a hard cap on how many rows the index returns before the threshold is applied, so choose it generously if many rows may match, otherwise qualifying rows beyond the limit are silently dropped. (Adding the WHERE directly to an ORDER BY ... LIMIT query also uses the index, but with the same hard cap.)
There are a number of system variables used for vectors. See Vector System Variables.
Vector Framework Integrations
MariaDB Vector is integrated in several frameworks, see Vector Framework Integrations.
Video summary
A vector (an embedding) is an ordered list of numbers.
AI models map content (like text) into vectors.
Similar meanings end up close together in vector space.
RAG and semantic search retrieve relevant items by finding the nearest vectors.
Try RAG with MariaDB Vector on your own MariaDB data! (blog post • 5 minutes • 2024)
The post, written by Robert Silén, explains how to build a Retrieval-Augmented Generation (RAG) system using MariaDB's native vector storage. It covers:
Preparation: Setting up a MariaDB 11.7+ environment and creating a table with the
VECTORdata type.Indexing: Using OpenAI's embedding model to vectorize documentation and store it in MariaDB.
Search & Generation: Performing a nearest-neighbor search (
VEC_DISTANCE_EUCLIDEAN) to find relevant context for a user's question and feeding that context into an LLM for an accurate response.
This page is licensed: CC BY-SA / Gnu FDL
INSERT INTO v VALUES
(1, x'e360d63ebe554f3fcdbc523f4522193f5236083d'),
(2, x'f511303f72224a3fdd05fe3eb22a133ffae86a3f'),
(3,x'f09baa3ea172763f123def3e0c7fe53e288bf33e'),
(4,x'b97a523f2a193e3eb4f62e3f2d23583e9dd60d3f'),
(5,x'f7c5df3e984b2b3e65e59d3d7376db3eac63773e'),
(6,x'de01453ffa486d3f10aa4d3fdd66813c71cb163f'),
(7,x'76edfc3e4b57243f10f8423fb158713f020bda3e'),
(8,x'56926c3fdf098d3e2c8c5e3d1ad4953daa9d0b3e'),
(9,x'7b713f3e5258323f80d1113d673b2b3f66e3583f'),
(10,x'6ca1d43e9df91b3fe580da3e1c247d3f147cf33e');INSERT INTO v VALUES
(1,Vec_FromText('[0.418708,0.809902,0.823193,0.598179,0.0332549]')),
(2,Vec_FromText('[0.687774,0.789588,0.496138,0.57487,0.917617]')),
(3,Vec_FromText('[0.333221,0.962687,0.467263,0.448235,0.475671]')),
(4,Vec_FromText('[0.822185,0.185643,0.683452,0.211072,0.554056]')),
(5,Vec_FromText('[0.437057,0.167281,0.0770977,0.428638,0.241591]')),
(6,Vec_FromText('[0.76956,0.926895,0.803376,0.0157961,0.589042]')),
(7,Vec_FromText('[0.493999,0.641957,0.761598,0.94276,0.425865]')),
(8,Vec_FromText('[0.924108,0.275466,0.0543329,0.0731585,0.136344]')),
(9,Vec_FromText('[0.186956,0.69666,0.0356002,0.668875,0.84722]')),
(10,Vec_FromText('[0.415294,0.609278,0.426765,0.988832,0.475556]'));SELECT id FROM v ORDER BY
VEC_DISTANCE_EUCLIDEAN(v, x'6ca1d43e9df91b3fe580da3e1c247d3f147cf33e');
+----+
| id |
+----+
| 10 |
| 7 |
| 3 |
| 9 |
| 2 |
| 1 |
| 5 |
| 4 |
| 6 |
| 8 |
+----+SELECT id FROM v
ORDER BY VEC_DISTANCE_EUCLIDEAN(v, x'6ca1d43e9df91b3fe580da3e1c247d3f147cf33e')
LIMIT 2;
+----+
| id |
+----+
| 10 |
| 7 |
+----+SELECT VEC_DISTANCE_COSINE(VEC_FROMTEXT('[1,2,3]'), VEC_FROMTEXT('[3,5,7]'));SELECT id FROM v
ORDER BY VEC_DISTANCE(v, x'6ca1d43e9df91b3fe580da3e1c247d3f147cf33e');
+----+
| id |
+----+
| 10 |
| 7 |
| 3 |
| 9 |
| 2 |
| 1 |
| 5 |
| 4 |
| 6 |
| 8 |
+----+SELECT t.id, 1 - t.distance AS score FROM (
SELECT id, VEC_DISTANCE_COSINE(embedding, @query) AS distance
FROM documents ORDER BY distance LIMIT 10
) AS t ORDER BY score DESC;SELECT * FROM (
SELECT content, VEC_DISTANCE_COSINE(embedding, @query) AS distance
FROM documents ORDER BY distance LIMIT 100
) AS t WHERE t.distance < 0.5;