RadixSpline: A Single-Pass Learned Index
A read-only learned index structure that can be built in a single pass over sorted data. Can be used as a drop-in replacement for std::multimap. Currently limited to uint32_t and uint64_t data types.
Build
mkdir -p build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make
./example
./tester
Examples
Using rs::Builder to index sorted data in one pass, without copying the data:
// Create random keys. vector<uint64_t> keys(1e6); generate(keys.begin(), keys.end(), rand); keys.push_back(8128); sort(keys.begin(), keys.end()); // Build RadixSpline. uint64_t min = keys.front(); uint64_t max = keys.back(); rs::Builder<uint64_t> rsb(min, max); for (const auto& key : keys) rsb.AddKey(key); rs::RadixSpline<uint64_t> rs = rsb.Finalize(); // Search using RadixSpline. rs::SearchBound bound = rs.GetSearchBound(8128); cout << "The search key is in the range: [" << bound.begin << ", " << bound.end << ")" << endl; auto start = begin(keys) + bound.begin, last = begin(keys) + bound.end; cout << "The key is at position: " << std::lower_bound(start, last, 8128) - begin(keys) << endl;
Using rs::MultiMap to index unsorted data, which internally creates a sorted copy:
vector<pair<uint64_t, char>> data = {{1ull, 'a'}, {12ull, 'b'}, {7ull, 'c'}, // Unsorted. {42ull, 'd'}}; rs::MultiMap<uint64_t, char> map(begin(data), end(data)); cout << "find(7): '" << map.find(7)->second << "'" << endl; cout << "lower_bound(3): '" << map.lower_bound(3)->second << "'" << endl;
Cite
Please cite our aiDM@SIGMOD 2020 paper if you use this code in your own work:
@inproceedings{radixspline,
author = {Andreas Kipf and
Ryan Marcus and
Alexander van Renen and
Mihail Stoian and
Alfons Kemper and
Tim Kraska and
Thomas Neumann},
title = {{RadixSpline}: a single-pass learned index},
booktitle = {Proceedings of the Third International Workshop on Exploiting Artificial
Intelligence Techniques for Data Management, aiDM@SIGMOD 2020, Portland,
Oregon, USA, June 19, 2020},
pages = {5:1--5:5},
year = {2020},
url = {https://doi.org/10.1145/3401071.3401659},
doi = {10.1145/3401071.3401659},
timestamp = {Mon, 08 Jun 2020 19:13:59 +0200},
biburl = {https://dblp.org/rec/conf/sigmod/KipfMRSKK020.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}