Settings

Theme

Iceberg, the Right Idea – The Wrong Spec

database-doctor.com

3 points by DannyPage 5 months ago · 1 comment

Reader

DannyPageOP 5 months ago

We just finished implementing Iceberg on top of a large set of Parquet files, stored in S3. It’s a neat idea to be able to turn a lot of data files into a SQL database, but I absolutely understand the pain and confusion the author writes, especially around how it handles metadata. It creates a lot of those files and makes a large mess of the directory. Some queries that I know would return a single parquet file take up to 30 seconds.

I don’t think we’ll scrap it and there are certainly ways to speed up the problematic aspects of querying the catalog, but I’m also rooting for DuckLake to make it a lot more approachable by not completely shying away from the database as an idea.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection