Database Benchmarks Lie (If You Let Them)
exasol.comThis got me curious about our Exasol environment, which we've been running since 2016 at Piedmont Healthcare. We average 2 million queries per day (DDL/DML/DQL). Our query failure rate is ~0.1%. Only 7% of those failures were due to hitting resource limits. The rest were SQL issues: constraint errors, data type issues, etc. Average connected users is ~400. Average concurrent queries is ~7 with a daily max average of ~78 concurrent queries. Avg query time across DQL statements is around 10 seconds, which is only that high due to some extreme outliers -- I have users that like to put 200k values in a WHERE clause IN statement, and Tableau sometimes likes to write gnarly SQL with LOD calcs and relationship models.
TPC-H benchmarks are what convinced us to purchase Exasol 10 years ago. Still happy with that decision! Congrats to the Exasol team on these results vs ClickHouse.
Very interesting. What are the bottlenecks you've faced with Exasol?
"200k values in a WHERE clause IN statement"? What is that column about?
Average concurrent query is ~7 in what time period?
As far as bottlenecks, I haven't really hit any in the 10 years we've been using it. Any bottleneck pain points are really user induced. We had some initial system instability during our upgrade from v7 to v8, which was a significant platform architecture change. Those issues have now been resolved. Exasol has pretty good support.
Regarding the 200k values in a where clause, we have some users that will do research across published data source in Tableau. They will copy account IDs from one report and paste them into a filter in another. Our connections from Tableau to Exasol are live. Tableau doesn't have great guardrails on the SQL that gets issued to the database.
The concurrent query comes from a daily statistics table in Exasol. There is an average and max concurrency measure aggregated per day. I averaged the last 30 days. Exasol doesn't really explain their sampling methodology in their documentation: https://docs.exasol.com/db/latest/sql_references/system_tabl...
When I search for "high performance analytical database" in Bing, Ai summarized results are ClickHouse, Apache Druid, Singlestore, Couchbase, and Apache Pinot are considered among the best databases for real-time analytics due to their low query latency and high performance.
In Google, Ai summarized results are ClickHouse, StarRocks, Snowflake, and Google BigQuery.
Clickhouse is there in both of them and Exasol is not mentioned. If these claims were relevant, why is it not in the limelight?
Clickhouse is known to ingest and analyze massive volumes of time-series data in real-time. How good is Exasol for this use case?
In case you have a single table with time-series data, then Clickhouse will perform typically better. It's very much optimized for this type of use cases. Once you are joining tables and having more advanced analytics, than Exasol will easily outperform it.
Exasol has been performance leader for more than 15 years in the market, as you can see in the official TPC-H publications, but has not gotten the broader market attention yet. We are trying to change that now and have recently been more active in the developer communities. We also just launched a completely free Exasol Personal edition that can be used for production use cases.
This is the article I wish existed when we were evaluating platforms. "Reliability under realistic conditions is the first scalability constraint". Speed means nothing if queries don't finish.
Traditional database benchmarks focus on throughput and latency – how many queries per second can be processed, how execution time changes as hardware resources increase. This benchmark revealed something different: reliability under realistic conditions is the first scalability constraint.
From my experience, planning is often the first headache I have to deal with (join order, hash sizing, operator choice), before concurrency and memory even come into play.
You mean the "execution plan" for your queries? Ideally, those types of decisions are automatically done by the database.
ideally? yes. in practice? big nope.
How you actually interpret what you're seeing here? does it look like more like optimizer fragility (plans that assume ideal memory conditions) or more like runtime memory management limits (good plans, but no adaptive behavior under pressure)?
I think the issue in the tests was the lack of a proper resource management of Clickhouse that led to queries failing under pressure. Although I have to admit that the level of pressure was minimal. Just a few concurrent users shouldn't be considered pressure. Also, having far more RAM than the whole database size means very little pressure. And the schema model is quite simple, just two fact tables and a few dimension tables.
Any database should be able to handle 100 concurrent queries robustly, even if this means to slow down the execution of queries.