Data Structures are Everwhere
this is how any data driven subfield and industry
store and access data
There is no perfect data structure
~100 new data structures are designed every year due to continuous hardware and workload changes
The Data Calculator enables interactive and semi-automated design of
data structures. It offers a set of
fine-grained design primitives that capture the first principles of
data layout design: how data structure nodes lay data out, and
how they are positioned relative to each other.
It allows of the computation of arbitrary data structure performance using learned cost models.
More details →
Step 1: Design Synthesis from First Principles
Nearly all new designs are about combining a small set of fundamental concepts in different ways or tunings. We are working towards mapping the design space of data structures to eventually find the first principles out of which all structures can be designed
Step 2: Learned Primitive Access Models
The Data Calculator contains a library of learned models that describe fundamental access patterns in blocks of data. They capture algorithmic, engineering, and hardware properties
Step 3: Algorithm and Cost Synthesis
For each operation in a workloads, the Data Calculator synthesizes the exact algorithm and its cost for the target hardware using an expert system. Based on the specification of the layout of each data structure node in the path of an operation it decides the best access pattern and based on the learned models it knows the expected cost
Accurate Cost Synthesis
Accurately calculating response time for complex designs and skewed workloads
How to Use

Iteratively test diff combinations of design/workload/hardware

Automatically identify “the best design possible” to match a workload and hardware
There are more possible data structures than stars in the sky
The periodic table of data structures
Is this going to replace engineers?
Did the arithmetic calculator replace mathematicians?
Publications
2018
S. Idreos, K. Zoumpatianos, B. Hentschel, M. S. Kester, and D. Guo, “The Data Calculator: Data Structure Design and Cost Synthesis From First Principles, and Learned Cost Models,” in ACM SIGMOD International Conference on Management of Data.
N. Dayan and S. Idreos, “Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging,” in ACM SIGMOD International Conference on Management of Data.
B. Hentschel, M. S. Kester, and S. Idreos, “Column Sketches: A Scan Accelerator for Rapid and Robust Predicate Evaluation,” in ACM SIGMOD International Conference on Management of Data
2017
2016


