Data Calculator | DASLab - Harvard

3 min read Original article ↗

Let us calculate

What if we could reason about the design space of data structures?

More Details

Data Structures are Everwhere

this is how any data driven subfield and industry
store and access data

There is no perfect data structure

~100 new data structures are designed every year due to continuous hardware and workload changes


The Data Calculator enables interactive and semi-automated design of data structures. It offers a set of fine-grained design primitives that capture the first principles of data layout design: how data structure nodes lay data out, and how they are positioned relative to each other. It allows of the computation of arbitrary data structure performance using learned cost models.
More details →

Step 1: Design Synthesis from First Principles


Nearly all new designs are about combining a small set of fundamental concepts in different ways or tunings. We are working towards mapping the design space of data structures to eventually find the first principles out of which all structures can be designed


Step 2: Learned Primitive Access Models


The Data Calculator contains a library of learned models that describe fundamental access patterns in blocks of data. They capture algorithmic, engineering, and hardware properties

Step 3: Algorithm and Cost Synthesis


For each operation in a workloads, the Data Calculator synthesizes the exact algorithm and its cost for the target hardware using an expert system. Based on the specification of the layout of each data structure node in the path of an operation it decides the best access pattern and based on the learned models it knows the expected cost


Accurate Cost Synthesis

Accurately calculating response time for complex designs and skewed workloads

How to Use


~20 seconds, 10 million key value pairs, 100 point queries

Iteratively test diff combinations of design/workload/hardware


~30 minutes, >10 million key value pairs, mixed point and range queries

Automatically identify “the best design possible” to match a workload and hardware


Utilize design continuums and cross design spaces

There are more possible data structures than stars in the sky

The periodic table of data structures

Is this going to replace engineers?

Did the arithmetic calculator replace mathematicians?

Publications


2018

S. Idreos, K. Zoumpatianos, B. Hentschel, M. S. Kester, and D. Guo, “The Data Calculator: Data Structure Design and Cost Synthesis From First Principles, and Learned Cost Models,” in ACM SIGMOD International Conference on Management of Data.

N. Dayan and S. Idreos, “Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging,” in ACM SIGMOD International Conference on Management of Data.

B. Hentschel, M. S. Kester, and S. Idreos, “Column Sketches: A Scan Accelerator for Rapid and Robust Predicate Evaluation,” in ACM SIGMOD International Conference on Management of Data



2017



2016

Back to top