Resin
Resin is a vector space search engine, a vector database and an anything key/value store. It powers efficient string processing, vector operations, and custom storage primitives designed for speed and simplicity. It can produce large language models out of strings and large anything models out of byte arrays.
Highlights
- Fast key/value storage with page/column readers and writers
- Practical text analysis utilities for strings, bags of words/chars, and vectors
- Commandline tools for building and validating lexicons and comparing strings
- Clean, dependency light design that is easy to extend
Key/Value Column Semantics
ColumnWriter
-
TryPut(TKey key, ReadOnlySpan value)
- Inserts the key/value only if the key does not exist in the column-wide snapshot (previous pages included).
- Returns
falsewhen the key already exists; otherwise writes to the current page. - Triggers page serialization when the page becomes full.
-
PutOrAppend(TKey key, ReadOnlySpan value)
- If the key exists anywhere in the column, no new key is stored. Instead, values are linked using a fixed-size node (
LinkedAddressNode) written to the value stream. - Tail-appending order: the original value remains first, followed by each appended value in insertion order. Address entry for the key points to the list head when linking is active.
- If the key does not exist in the column snapshot, operates at the page level (insert/append within the current page) and may serialize when full.
- If the key exists anywhere in the column, no new key is stored. Instead, values are linked using a fixed-size node (
ColumnReader
-
Get(TKey key)
- Returns the value for
key. If the key’s address entry points to a linked-list head, returns the concatenated bytes of all linked values. - Returns
ReadOnlySpan<byte>.Emptywhen the key does not exist.
- Returns the value for
-
GetMany(TKey key, out int count)
- Returns a concatenated
ReadOnlySpan<byte>of all values linked forkeyand outputs the number of items viacount. - When the key points to a single raw value, returns that value and
count = 1. If the key does not exist, returns empty andcount = 0.
- Returns a concatenated
TKey Restrictions for ColumnWriter/ColumnReader
When working with TKey, please adhere to the following restrictions to ensure proper functionality:
TKeymust be a value type (struct) and implement bothIEquatable<TKey>andIComparable<TKey>.- Ordering and equality must be stable across sessions. The column-wide key snapshot uses
BinarySearch/sorting, soCompareTomust define a strict total order consistent withEquals. - Page-level storage operates on
longkeys. For primitive numeric keys:doubleandfloatare stored via their IEEE bit representations.intandlongare stored directly.- Other
TKeytypes are hashed viaGetHashCode()to alongfor page-level operations.
- Recommendation: Use numeric primitives (
double,float,int,long) for deterministic ordering and lookup. If using a custom struct, ensure:EqualsandCompareToare consistent and deterministic.GetHashCode()is stable and evenly distributed; collisions affect page-level operations since non-primitive keys are hashed tolong.
- Keys must be comparable across the entire column; duplicate detection relies on the column snapshot and
BinarySearchover sorted keys.
Column model and set operations
- Each column stores any given
TKeyat most once in its column-wide snapshot (duplicate keys are prevented by bothTryPutandPutOrAppend). This makes columns effectively sets of keys, enabling set operations such as union, intersection, and joins across columns. Linked values (viaPutOrAppend) attach additional data to the existing key without introducing duplicates.
Storage artefacts: *.key, *.adr, *.val
- .key (Key stream)
- Stores the sorted sequence of
TKeyrepresentations per page/column. Keys are written in fixed-size slots (sizeof(long)per entry for page-level storage) and serialized in page batches. The column-wide snapshot is built by reading and sorting this stream.
- Stores the sorted sequence of
- .adr (Address stream)
- Stores
Addressstructs aligned with.keyentries. EachAddresscontainsOffsetandLength:- For raw values: points directly into
.val(Offset = start of value, Length = byte length). - For linked values: points to a
LinkedAddressNodehead in.val(Length equals node size). The node chain yields multiple values for a single key.
- For raw values: points directly into
- Stores
- .val (Value stream)
- Stores the actual value bytes and any
LinkedAddressNodeheaders used for linking. Values are appended at the end of the stream;LinkedAddressNodes are also written into.valto form singly linked lists via absolute offsets.
- Stores the actual value bytes and any
Immutability of value files
- The
.valstream is treated as append-only:- Existing bytes are never modified in place.
- New values are written at the end, preserving previously written offsets.
- Linking does not rewrite existing values; instead,
LinkedAddressNodeheaders are appended and previous node’sNextOffsetis patched by writing a new node and updating pointers via.adralignment.
- Benefits:
- Stable offsets enable safe caching of addresses and efficient read paths.
- Appending scales linearly, minimizing fragmentation and avoiding in-place mutations.
- Historical values remain intact; multi-value chains are expressed via node headers rather than overwriting data.
Usage
- Use
Resin.KeyValuefor fast on disk structures and efficient read/write key/value sessions. - Use
Resin.TextAnalysisforStringAnalyzer,VectorOperations, and similarity tooling. - Use
Resin.WikipediaCommandLinefor commandline tools to build/validate lexicons. See detailed CLI usage and setup inResin.WikipediaCommandLine/README.md.
Contributing
Contributions are welcome! Please open an issue or pull request with clear motivation, tests when applicable, and concise changes.
License
This project is licensed under the MIT License.