OLAP Workload Testing & Benchmarking Suite
A comprehensive benchmark framework for evaluating analytical database performance using real-world datasets and standardized testing methodologies.
Table of Contents
- Overview
- Why OLAP Benchmarking Matters
- Dataset Description
- Supported Database Engines
- Prerequisites & System Requirements
- Quick Start Guide
- Reproducibility
- Benchmark Methodology
- Performance Results
- Key Observations
- Limitations & Considerations
- Contributing
- FAQ
- License
Overview
This benchmark suite provides standardized testing for Online Analytical Processing (OLAP) workloads across multiple database engines. The framework enables fair, reproducible performance comparisons using real-world analytical queries and datasets, helping organizations make informed decisions about analytical database technologies.
Scope and Purpose
This benchmark specifically focuses on open-source analytical databases that can be deployed on-premises and containerized with Docker. While we acknowledge the existence of excellent cloud-native solutions like Snowflake, Amazon Redshift, Google BigQuery, and Azure Synapse Analytics, these managed services are outside the scope of this evaluation.
Why Open-Source, On-Premises, and Dockerizable?
- Data Sovereignty: Organizations requiring complete control over their data location and governance
- Cost Predictability: Fixed infrastructure costs vs. variable cloud consumption pricing
- Compliance Requirements: Industries with strict data residency and security regulations
- Technology Independence: Avoiding vendor lock-in with portable, standards-based solutions
- Customization Flexibility: Ability to modify and tune systems for specific use cases
- Container Portability: Docker deployment enables consistent environments across development, testing, and production
- Infrastructure Agnostic: Run on any Docker-compatible platform, from laptops to enterprise clusters
This benchmark evaluates databases that can be deployed in your own infrastructure using Docker containers, whether on bare metal, virtual machines, Kubernetes clusters, or private cloud environments, providing organizations with truly portable and self-managed analytical solutions.
Quick Results Summary
Top Performers:
- π₯ ClickHouse: 6.41s total, 100% success rate - Best overall performance
- π₯ StarRocks: 6.76s total, 100% success rate - Excellent consistency
- π₯ Apache Doris: 7.66s total, 100% success rate - Balanced performance across all queries
Key Findings:
- ClickHouse and StarRocks dominate performance benchmarks with sub-7 second total execution times
- TiDB/TiFlash: 2m 39s load time, 100% query success rate, significantly improved analytical performance with TiFlash replicas on all tables
- Apache Doris delivers balanced performance with excellent compatibility across all query types
- MariaDB ColumnStore shows compatibility issues with complex queries, though it has the fastest data load time
- Real-world flight data (38M+ records) provides realistic testing scenarios
OLAP vs OLTP
OLAP systems are designed to handle complex analytical queries that typically involve:
- Large-scale aggregations across millions of records
- Complex multi-table joins and dimensional analysis
- Time-series analysis and trend identification
- Ad-hoc analytical queries with varying selectivity
Unlike OLTP (Online Transaction Processing) systems optimized for high-volume transactional workloads, OLAP databases excel at read-heavy analytical operations, making them essential for business intelligence, data analytics, and decision support systems.
Why OLAP Benchmarking Matters
Performance Evaluation
Modern organizations generate massive volumes of data requiring fast analytical processing. Benchmarking helps identify which database engines can handle your specific analytical workloads with acceptable performance characteristics.
Technology Selection
With numerous analytical database options available, from traditional columnar stores to modern cloud-native solutions, standardized benchmarking provides objective performance metrics to guide technology decisions.
Cost Optimization
Understanding performance characteristics across different systems enables better resource planning and cost optimization, especially in cloud environments where compute and storage costs directly impact operational expenses.
Dataset Description
The benchmark utilizes the Bureau of Transportation Statistics (BTS) On-Time Performance dataset, representing typical analytical scenarios found in operational reporting and business intelligence applications.
Schema Design (Star Schema)
| Table | Type | Records | Description |
|---|---|---|---|
airlines |
Dimension | 30 | Airline carrier information |
airports |
Dimension | 400 | Airport details and geographic data |
flights |
Fact | 38,083,735 | Flight departure/arrival data |
This star schema design reflects typical data warehouse patterns, enabling realistic testing of dimension-fact table joins, aggregations, and filtering operations.
Supported Database Engines
Tested Versions:
- ClickHouse v23.8+
- Apache Doris v3.0+
- StarRocks v3.0+
- TiDB v7.0+ (with TiFlash)
- MariaDB ColumnStore v23.02.4+
Purpose-Built OLAP Systems
- Apache Doris: Open-source MPP analytical database with vectorized execution engine, designed for real-time analytics and high-performance OLAP workloads
- ClickHouse: Open-source MPP columnar analytical database with vectorized execution, optimized for real-time analytics, time-series data, and high-performance OLAP workloads
- StarRocks: Open-source MPP analytical database with vectorized execution engine, designed for real-time analytics and high-performance OLAP workloads
Hybrid and Specialized Systems
- MariaDB ColumnStore: Columnar storage engine for analytical workloads within the MariaDB ecosystem
- PingCap TiDB: While primarily designed as an HTAP (Hybrid Transactional/Analytical Processing) system for OLTP workloads, TiDB includes TiFlash, a columnar storage extension that enables analytical query processing. TiFlash provides columnar replicas of TiDB data, allowing the system to handle both transactional and analytical workloads within a unified architecture.
Note: TiDB is not primarily an OLAP product but offers analytical capabilities through TiFlash.
Prerequisites & System Requirements
Software Dependencies
Docker & Docker Compose
- Installation: https://docs.docker.com/get-docker/
Python 3.12+ with required packages:
- See Quick Start Guide below
TiUP (for TiDB cluster management):
curl --proto '=https' --tlsv1.2 -sSf https://tiup-mirrors.pingcap.com/install.sh | sh
ClickHouse Client:
- macOS:
brew install clickhouse - Linux:
curl https://clickhouse.com/ | sh - Windows: Download from https://clickhouse.com/docs/en/getting-started/install
Hardware Recommendations
| Component | Minimum | Recommended | Benchmark System |
|---|---|---|---|
| Memory | 16GB RAM | 32GB+ RAM | 16GB (M1 Pro) |
| Storage | 200GB free space | SSD storage | 926GB SSD |
| CPU | Multi-core | High-performance multi-core | M1 Pro (10-core) |
| Network | Standard | High bandwidth for data loading | Localhost |
Note: The benchmark results shown were obtained on a MacBook Pro M1 Pro with 16GB RAM, demonstrating that excellent performance can be achieved even on laptop-class hardware.
Quick Start Guide
1. Environment Setup
git clone <repository-url> cd analytics_benchmark # Install Python dependencies pip install -r requirements.txt
2. Data Preparation
3. Database-Specific Execution
Apache Doris
# Start Apache Doris docker compose -f docker/doris.yml up -d # Load data (note: Doris requires additional time for backend nodes to initialize) python3 load/load_data.py --database doris # Run benchmark python3 run_benchmarks.py --database doris
ClickHouse
# Start ClickHouse docker compose -f docker/clickhouse.yml up -d # Load data python3 load/load_data.py --database clickhouse # Run benchmark python3 run_benchmarks.py --database clickhouse
MariaDB ColumnStore
# Start MariaDB ColumnStore docker compose -f docker/columnstore.yml up -d # Load data python3 load/load_data.py --database columnstore # Run benchmark python3 run_benchmarks.py --database columnstore
TiDB with TiFlash
# Update TiUP to latest version tiup update --all # Start TiDB using TiUP (note: TiDB uses TiUP instead of Docker for optimal performance) tiup playground # Load data python3 load/load_data.py --database tidb # Run benchmark python3 run_benchmarks.py --database tidb
Note: TiDB uses TiUP for cluster management rather than Docker Compose. TiUP provides better resource management and TiFlash integration for analytical workloads.
StarRocks
# Start StarRocks docker compose -f docker/starrocks.yml up -d # Load data python3 load/load_data.py --database starrocks # Run benchmark python3 run_benchmarks.py --database starrocks
Important Notes on Database Startup
Apache Doris & StarRocks: These systems require additional time for backend nodes to initialize. The data loading script will automatically wait for all components to be ready (up to 2 minutes for each system). If you encounter "no available capacity" errors, the backend nodes may still be provisioning - wait longer or restart the containers.
TiDB: TiFlash columnar replicas need time to synchronize after initial data loading for optimal analytical performance.
ClickHouse: Generally starts quickly but may need a few seconds for metadata initialization.
MariaDB ColumnStore: Requires proper columnstore engine initialization which can take 30-60 seconds.
Reproducibility
To reproduce the exact results shown in this benchmark:
- Environment: Use the same hardware specifications listed in Testing Environment
- Software Versions: Ensure you're using the database versions specified in Supported Database Engines
- Data Consistency: The benchmark uses the same BTS dataset with 38,083,735 flight records
- Configuration: Use the provided Docker configurations without modification
- Methodology: Follow the cold cache approach - restart containers between test runs
Note: Results may vary slightly due to system load and hardware differences, but relative performance rankings should remain consistent.
Benchmark Methodology
Query Suite Design
The benchmark includes a comprehensive set of 20 analytical queries representing common OLAP patterns:
- Simple aggregations and filtering operations
- Multi-table joins with dimensional analysis
- Time-based analysis and seasonal trending
- Complex analytical functions and window operations
- Advanced star schema pattern demonstrations
- Competitive analysis and market intelligence queries
- Aircraft utilization and operational efficiency analysis
- Weather impact and disruption pattern analysis
- Cross-airline performance benchmarking
- Varying data selectivity and aggregation complexity levels
Example Query Patterns:
The actual benchmark queries are significantly more complex than basic examples. Here are simplified representations of the analytical patterns used:
-- Pattern from Query 1: Multi-CTE airline market analysis -- (Actual query calculates market share percentages with cross joins) WITH flight_stats AS ( SELECT a.airline, COUNT(*) AS volume, SUM(f.cancelled) AS cancelled FROM flights f JOIN airlines a ON f.carrier = a.iata_code WHERE f.year = 2020 GROUP BY a.airline ) SELECT fs.airline, fs.volume, ROUND(100 * fs.volume / total.volume, 2) AS market_share_pct FROM flight_stats fs CROSS JOIN (SELECT SUM(volume) AS volume FROM flight_stats) total; -- Pattern from Query 12: Advanced window functions with performance categorization -- (Actual query includes month name conversion and performance categories) WITH monthly_perf AS ( SELECT a.airline, f.month, AVG(f.arr_delay) AS avg_delay FROM flights f JOIN airlines a ON f.carrier = a.iata_code GROUP BY a.airline, f.month HAVING COUNT(*) >= 100 ) SELECT airline, month, avg_delay, RANK() OVER (PARTITION BY month ORDER BY avg_delay ASC) AS rank, CASE WHEN RANK() OVER (...) <= 3 THEN 'Top Performer' ELSE 'Average' END AS category FROM monthly_perf;
Note: These are simplified representations. The actual queries in
queries/sql/are more sophisticated, featuring complex CTEs, advanced aggregations, comprehensive business logic, and performance optimizations. View the individual.sqlfiles for complete query implementations.
Performance Metrics
- Query Execution Time: Individual query performance measurement
- Data Loading Performance: Bulk data ingestion capabilities
- Error Handling: System stability and query completion rates
- Resource Utilization: Memory and CPU consumption patterns
Testing Environment
- Hardware: MacBook Pro (M1 Pro, 2021)
- CPU: Apple M1 Pro (10-core: 8 performance + 2 efficiency cores)
- Memory: 16GB unified memory
- Storage: 926GB SSD
- OS: macOS 15.5 (24F74)
- Containerization: Docker Desktop for Mac
- Container Resources:
- Memory limit: 8GB per database container
- CPU limit: 6 cores per database container
- Shared volumes for data persistence
- Methodology: Cold cache scenarios for realistic performance
- Workload: Single-user workload simulation
- Resource Allocation: Standardized hardware allocation across all databases
- Network: Localhost connections to eliminate network latency
Performance Results
Last Updated: November 2025 | Dataset: 38,083,735 flight records | Test Environment: MacBook Pro M1 Pro
Data Load Performance
The following table shows data ingestion performance for the complete dataset (38,083,735 flight records):
Load Speed Legend:
- π’ Fast (< 1.5 minutes)
- π‘ Moderate (1.5β5 minutes)
- π Slow (5β10 minutes)
- π΄ Very Slow (> 10 minutes)
| Database | Import Time | Rows/s | Status | Notes |
|---|---|---|---|---|
| ClickHouse | π’ 57.00 sec | 694,723 | β Success | Native CSV, columnar |
| Doris | π’ 85.00 sec | 447,940 | β Success | Stream Load, BE warmup |
| StarRocks | π’ 100.00 sec | 379,777 | β Success | Vectorized ingest |
| TiDB/TiFlash | π‘ 159.31 sec | 239,080 | β Success | Lightning, all tables replica |
| ColumnStore | π’ 39.00 sec | 1,109,264 | β Success | cpimport, conversion |
Query Execution Performance
The following table shows execution times for each query in the benchmark suite:
Performance Legend:
- π’ Excellent (< 0.5 seconds)
- π‘ Good (0.5 - 2.0 seconds)
- π Moderate (2.0 - 10.0 seconds)
- π΄ Slow (> 10.0 seconds)
- β Error
| Query | ClickHouse | ColumnStore | Doris | StarRocks | TiDB |
|---|---|---|---|---|---|
| 1 | π’ 0.13 sec | β Error | π’ 0.19 sec | π’ 0.13 sec | π’ 0.13 sec |
| 2 | π‘ 0.54 sec | π 6.35 sec | π‘ 0.53 sec | π‘ 0.99 sec | π’ 0.43 sec |
| 3 | π’ 0.05 sec | β Error | π’ 0.05 sec | π’ 0.08 sec | π’ 0.08 sec |
| 4 | π’ 0.01 sec | β Error | π’ 0.02 sec | π’ 0.05 sec | π’ 0.05 sec |
| 5 | π’ 0.10 sec | π‘ 0.99 sec | π’ 0.06 sec | π’ 0.08 sec | π’ 0.18 sec |
| 6 | π‘ 0.66 sec | π΄ 15.03 sec | π‘ 0.88 sec | π‘ 0.57 sec | π‘ 0.75 sec |
| 7 | π‘ 0.63 sec | π΄ 14.40 sec | π‘ 0.76 sec | π’ 0.46 sec | π‘ 0.71 sec |
| 8 | π’ 0.06 sec | π‘ 1.96 sec | π’ 0.11 sec | π’ 0.08 sec | π’ 0.12 sec |
| 9 | π‘ 0.68 sec | π΄ 14.27 sec | π‘ 0.77 sec | π’ 0.42 sec | π‘ 0.69 sec |
| 10 | π’ 0.09 sec | π 2.57 sec | π’ 0.14 sec | π’ 0.11 sec | π’ 0.15 sec |
| 11 | π’ 0.22 sec | π 4.04 sec | π’ 0.28 sec | π’ 0.19 sec | π’ 0.31 sec |
| 12 | π‘ 0.81 sec | π΄ 20.81 sec | π‘ 1.07 sec | π‘ 0.62 sec | π‘ 0.96 sec |
| 13 | π’ 0.25 sec | π 4.74 sec | π’ 0.31 sec | π’ 0.19 sec | π‘ 1.18 sec |
| 14 | π’ 0.49 sec | π 9.58 sec | π’ 0.44 sec | π’ 0.36 sec | π΄ 44.99 sec |
| 15 | π’ 0.40 sec | π 9.05 sec | π’ 0.48 sec | π’ 0.27 sec | π 3.43 sec |
| 16 | π’ 0.24 sec | π 6.32 sec | π’ 0.33 sec | π‘ 0.90 sec | π’ 0.34 sec |
| 17 | π’ 0.16 sec | β Error | π’ 0.16 sec | π’ 0.14 sec | π’ 0.18 sec |
| 18 | π’ 0.33 sec | π 5.74 sec | π’ 0.40 sec | π‘ 0.66 sec | π’ 0.42 sec |
| 19 | π’ 0.31 sec | β Error | π’ 0.38 sec | π’ 0.27 sec | π‘ 1.63 sec |
| 20 | π’ 0.25 sec | π 5.36 sec | π’ 0.30 sec | π’ 0.19 sec | π’ 0.43 sec |
Key Observations
Performance Leaders
- ClickHouse: Exceptional performance across all query types with 100% success rate and consistently fast execution times. Particularly strong for simple aggregations and filtering operations, with excellent scalability to complex analytical queries.
- StarRocks: Outstanding performance with 100% query success rate and highly competitive execution times. Demonstrates excellent consistency across different query patterns with total execution time of 6.76 seconds.
- Apache Doris: Balanced performance characteristics suitable for general analytical workloads with full query compatibility and competitive 7.66 second total execution time.
Data Loading Performance: MariaDB ColumnStore leads in bulk ingestion speed (39 seconds), followed by ClickHouse (57 seconds) and Doris (85 seconds). TiDB/TiFlash requires more time (159 seconds) due to Lightning bulk import and TiFlash replica creation across all tables.
System-Specific Insights
ClickHouse Analysis: Demonstrates outstanding performance with perfect 100% query success rate (20/20 queries) and fastest overall execution time (6.41 sec). Shows excellent optimization for both simple and complex analytical patterns, making it ideal for high-performance analytical workloads. Also among the fastest for bulk data ingestion (57.00 sec).
StarRocks Analysis: Delivers exceptional performance with perfect 100% query success rate (20/20 queries) and highly competitive total execution time (6.76 sec). Shows excellent vectorized execution capabilities and strong optimization for both simple and complex analytical patterns, making it a strong contender for high-performance analytical workloads. Data load speed is also competitive (100.00 sec).
TiDB/TiFlash Analysis: Achieves 100% query success rate (20/20 queries) with total execution time of 57.16 sec. TiDB shows significantly improved analytical performance with TiFlash replicas enabled across all tables, making it much more competitive with purpose-built OLAP systems. Most queries now execute in sub-second timeframes, with only complex analytical queries (queries 14, 15) showing longer execution times. The system provides the unique advantage of unified HTAP capabilitiesβenabling transactional and analytical workloads on the same dataset without ETL processes. Performance varies by query complexity, with simple to moderate queries executing very competitively and only the most complex analytical patterns taking longer. Data load speed (159.31 sec) reflects the overhead of TiDB Lightning bulk import and TiFlash replica creation across all three tables. TiDB/TiFlash is now a strong choice for organizations prioritizing hybrid transactional/analytical workloads with substantially improved analytical query performance.
MariaDB ColumnStore: Shows moderate compatibility challenges with complex analytical queries, achieving 75% success rate (15/20 queries). Failed queries primarily involve advanced CTEs and complex window functions, indicating some limitations with modern SQL analytical patterns. When successful, performance is generally slower than purpose-built OLAP systems, with total execution time of 121.21 sec for successful queries. Despite the fastest data load time (39.00 sec), query compatibility and execution speed lag behind OLAP-focused systems.
Apache Doris: Demonstrates excellent compatibility with 100% query success rate and balanced performance characteristics across all query complexity levels, making it suitable for comprehensive analytical workloads. Data load speed is competitive (85.00 sec).
Performance Summary
| Database | Query Time | Success | Failed | Load Time | Best For |
|---|---|---|---|---|---|
| ClickHouse | 6.41 sec | 100% | None | 57.00 sec | High-perf analytics |
| StarRocks | 6.76 sec | 100% | None | 100.00 sec | Consistent performance |
| Doris | 7.66 sec | 100% | None | 85.00 sec | Balanced workloads |
| TiDB/TiFlash | 57.16 sec | 100% | None | 159.31 sec | HTAP scenarios |
| ColumnStore | 121.21 sec | 75% | 1,3,4,17,19 | 39.00 sec | Legacy integration |
*Query Time: total for successful queries only
Limitations & Considerations
- Results reflect single-user scenarios and may not represent concurrent workload performance
- Container-based deployment may not reflect optimized production configurations
- Dataset size and query patterns may not represent all analytical workload characteristics
- Performance results are specific to the tested hardware and software configurations
Contributing
We welcome contributions to improve this benchmark suite! Here's how you can help:
Ways to Contribute
- Add New Databases: Implement connectors for additional OLAP systems
- Expand Query Suite: Add more analytical query patterns
- Improve Documentation: Enhance setup guides and analysis
- Report Issues: Submit bug reports or performance inconsistencies
- Add Test Cases: Contribute additional datasets or scenarios
Getting Started
- Fork the repository
- Create a feature branch (
git checkout -b feature/new-database) - Make your changes and test thoroughly
- Submit a pull request with detailed description
Development Guidelines
- Follow existing code patterns and naming conventions
- Add comprehensive documentation for new features
- Include performance benchmarks for new database integrations
- Ensure Docker configurations are properly tested
Contact: For questions or collaboration opportunities, please open an issue or reach out via GitHub discussions.
FAQ
Common Questions
Q: Why do some queries fail on certain databases? A: Different databases have varying SQL compatibility. ColumnStore, for example, has limitations with advanced CTEs and window functions. All failed queries are clearly marked with β in the results.
Q: Can I run this benchmark on my own hardware? A: Yes! The benchmark is designed to be portable. Results will vary based on your hardware configuration, but relative performance comparisons should remain consistent.
Q: How often are these benchmarks updated? A: We aim to update benchmarks quarterly or when major database versions are released. Check the commit history for the latest updates.
Q: How does TiDB compare to purpose-built OLAP systems? A: TiDB is primarily an HTAP system designed for transactional workloads. While its analytical performance via TiFlash is slower than purpose-built OLAP systems like ClickHouse and StarRocks, it offers the unique advantage of unified transactional and analytical processing on the same dataset without ETL. With TiFlash replicas enabled on all tables, TiDB now delivers competitive analytical performance for organizations prioritizing hybrid workloads.
Q: Can I add my own queries to the benchmark?
A: Absolutely! Add your queries to the queries/sql/ directory and they'll be automatically included in the benchmark run.
Troubleshooting
Issue: "No available capacity" errors with Doris/StarRocks
Solution: Backend nodes need time to initialize. Wait 2-3 minutes or restart containers.
Issue: ClickHouse connection failures
Solution: Ensure the client is installed and the container is fully started before running benchmarks.
Issue: TiDB data loading is slow
Solution: TiFlash needs time to create columnar replicas. This is normal and improves query performance.
Issue: "Couldn't find a tree builder with the features you requested: xml" error during data download
Solution: This indicates lxml is missing. Run pip install -r requirements.txt again, which now includes lxml for XML parsing support.
Related Projects
For additional context and comparison, consider these established analytical benchmarks:
- TPC-H: Industry-standard decision support benchmark
- TPC-DS: Decision support benchmark with complex queries
- ClickBench: ClickHouse-focused analytical benchmark
- Apache Arrow DataFusion Benchmarks: Modern analytical engine benchmarks
This benchmark complements existing efforts by focusing specifically on Docker-deployable, on-premises solutions with real-world transportation data, providing a practical evaluation framework for organizations considering self-hosted analytical databases.
License
This project is licensed under the MIT License - see the LICENSE file for details.
This benchmark provides a foundation for analytical database evaluation. Production deployment decisions should consider additional factors including operational requirements, integration capabilities, support ecosystems, and total cost of ownership beyond raw performance metrics.