Settings

Theme

Ask HN: Good SQL statements to benchmark RDBMS?

1 points by herodoturtle 25 days ago · 7 comments · 1 min read


Hi folks,

We’re building a benchmark tool in C to compare the speed of DML statements in assorted RDBMS vanilla installations (Postgres, MySQL, etc.).

We’re not doing anything fancy / scientific - it’s purely for internal learning purposes.

Are there any particular DML statements that we should include in our benchmark?

We’d welcome any tips or insights on this topic.

(As an aside, we plan to use clock_t for tracking CPU time, with long-running loops that span several hundred seconds). But of course we welcome advice on this too!

Thanks ^_^

swashq 25 days ago

A few patterns that stress different parts of the engine:

Window functions (tests optimizer): SELECT , ROW_NUMBER() OVER (PARTITION BY category ORDER BY date DESC) FROM large_table

Large aggregations (tests hash tables): SELECT category, COUNT(), AVG(value), PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY value) FROM large_table GROUP BY category

Self-joins (tests join algorithms): SELECT a., b. FROM table a JOIN table b ON a.foreign_key = b.id WHERE a.date BETWEEN '2024-01-01' AND '2024-12-31';

String operations (often overlooked): SELECT * FROM table WHERE column LIKE '%pattern%';

Also test with varying data sizes (1K, 100K, 10M rows). Some engines optimize differently at scale.

goh-chunlin 25 days ago

clock_t measures CPU time (the time the processor spent on your specific process). In database benchmarking, the CPU is often idling while waiting for Disk I/O or Network latency.

If clock_t is used, normally results will show the database is "blazing fast" because it does not count the time spent waiting for the hard drive to actually save the data.

You may need to look into CLOCK_MONOTONIC. clock_gettime(CLOCK_MONOTONIC, ...) will measure "Wall Clock" time.

  • herodoturtleOP 25 days ago

    Neat, thanks for the insight ^_^

    Any particular DML tests you’d recommend for the benchmark? Or thoughts on this in general.

    • goh-chunlin 22 days ago

      Glad that helped!

      Perhaps you can start with comparing 1,000 individual INSERTs (each with its own COMMIT) vs. 1,000 INSERTs wrapped in a single BEGIN/COMMIT block. This test is particularly interesting for comparing different RDBMS because it reveals how each engine balances data durability (ACID) against performance. You will likely find that some DB have a much higher tax on commits than others, which is a key factor in choosing a DB for a specific use case.

      Since you are in C, please make sure to use prepared statements. If you send raw SQL strings, you are partially benchmarking the string parser of the DB rather than the actual data manipulation. Also, remember to "warm up" the DB with a few hundred runs before you start the timer, otherwise, the results will be skewed by the initial disk-to-memory cache loading.

SRMohitkr 25 days ago

you are really a good task for now i am going to thinking,after will comment.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection