Settings

Theme

DuckDB can be 5x faster than Spark at 500M record files

blog.dataexpert.io

4 points by peterdstallion 6 months ago · 1 comment

Reader

peterdstallionOP 6 months ago

This test was done on a small dev laptop with 16GB of RAM, scanning a 500M row (record) 23GB Parquet file. DuckDB proved to be 5x faster.

A bit of an obvious one - small data tech is faster at small data. It serves more of a lower bound reminder of what "small data" is nowadays.

The article rightly starts with:

> Processing power on laptops has increased dramatically over the last twenty years. This allows single laptops to accomplish what we needed multi-node Spark clusters to do ten years ago.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection