Data Engineering Whitepapers

5 min read Original article ↗

A curated list of influential whitepapers in the field of data engineering.


# Data Lakehouse

Data Lakehouse combines the best of data warehouses and data lakes into a single architecture.


# Distributed Systems & Storage

Foundational papers on distributed systems that power modern data infrastructure at scale.


# Data Warehousing & OLAP

Data warehouses and OLAP systems optimized for analytical queries over large datasets.


# Processing Engines

Data processing frameworks for batch processing and streaming computation.


# DuckDB

DuckDB got his own categories as single-file OLAP database.


# SQL

All about SQL, the domain-specific language to query databases and more.


# Relational Model

Relational databases organize data into tables with rows and columns, pioneered by Edgar F. Codd.


# NoSQL

NoSQL databases trade relational guarantees for horizontal scalability and flexible schemas.


# Schema Evolution

Schema Evolution addresses how databases handle changes to data structures over time.


# Data Architecture & Governance

Patterns for organizing data assets, governance, and discovery across the enterprise.


# Git for Data

Git for Data is version control concepts applied to Datasets and data pipelines.


# Database Extensibility & Research

Academic and industry research on database design, extensibility, and human-data interaction.


AI Whitepapers

# Other Lists


Origin: Data Engineering Vault
References:
Created 2024-01-05