This PR introduces MVP-level support for streaming writes and batch reads with Apache Iceberg tables, fully implemented in C++ (no JNI). While existing C++ projects like ClickHouse and DuckDB focus on read-only Iceberg integration, this implementation adds native write capabilities, enabling end-to-end data pipelines directly from SQL.
Key Highlights
- Streaming writes: Continuously write data to Iceberg tables via materialized views or direct INSERT statements.
- Zero Java dependencies: Native C++ integration leveraging Apache Arrow for file I/O and AWS SDK for S3/Glue.
- SQL-first workflows: Manage Iceberg catalogs, tables, and writes using familiar SQL syntax.
What’s Working (MVP) ✅
https://docs.timeplus.com/iceberg
Catalog & Setup
- Support for Iceberg REST Catalog (verified with AWS Glue and S3 Table).
- Create new Iceberg tables via SQL.
- Support AWS SigV4 authentication for catalog(Glue)/storage(s3).
Write Operations
- Append data via INSERT INTO or streaming materialized views.
- AWS S3 storage with environment/IAM credentials.
Read Operations
- Batch read entire Iceberg tables (v1/v2 formats).
Usage Example
-- Connect to a Iceberg database managed by AWS Glue, using AK/SK/IAM from the host CREATE DATABASE demo SETTINGS type='iceberg', warehouse='(aws-12-id)', catalog_type='rest', catalog_uri='https://glue.us-west-2.amazonaws.com/iceberg', storage_endpoint='https://bucket.s3.us-west-2.amazonaws.com', rest_catalog_sigv4_enabled=true, rest_catalog_signing_region='us-west-2', rest_catalog_signing_name='glue'; -- Switch to the Iceberg database namespace USE demo; -- List existing Iceberg tables SHOW STREAMS; INSERT INTO demo.existing_table VALUES(..) -- Or create a new Iceberg table and use MV to write data CREATE STREAM transformed( timestamp datetime64, org_id string, float_value float, array_length int, max_num int, min_num int ); -- Stream data to Iceberg CREATE MATERIALIZED VIEW mydb.mv_write_iceberg INTO demo.transformed AS SELECT now() AS timestamp, org_id, float_value, length(`array_of_records.a_num`) AS array_length, array_max(`array_of_records.a_num`) AS max_num, array_min(`array_of_records.a_num`) AS min_num FROM mydb.msk_stream SETTINGS s3_min_upload_file_size=1024;
What’s Next (Help Wanted!) 🔧
Write Improvements
- DELETE and UPSERT operations.
- Partitioning support (bucket, truncate).
- INSERT OVERWRITE operations.
- Merge-on-read for updates/deletes.
Read Improvements
- Streaming incremental reads (snapshot tracking).
- Time travel queries.
Catalog & Security
- Support S3 Table (Done in preview 3)
- Support Apache Gravitino catalog (Done in preview 3)
- Support Apache Polaris catalog
- Database/Hive catalog
Maintenance
- Snapshot management, version/branch/tag management
- Schema evolution enhancements
Try it now:
- We are still working on the test cases and fixing CI issues. Before we create a new Timeplus Proton release with this PR merged in, you can install Timeplus Enterprise 2.8 on Linux or macOS. Please follow the guide at https://docs.timeplus.com/enterprise-v2.8#2_8_0
You can use the web console at http://localhost:8000/ to run SQL.
Use the SQL examples above to connect to the Iceberg databases and read/write data.
You can also use this docker image on Linux/macOS/Windows:
docker.timeplus.com/timeplus/timeplusd:2.8.14. For example, start a container with the AWS AK/SK from the env var:
docker run --name timeplus_iceberg -e AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY -d -p 7587:7587 -p 8463:8463 docker.timeplus.com/timeplus/timeplusd:2.8.14
- Demo video: how to use Timeplus to read data from Amazon MSK(Managed Service for Kafka), apply stream processing, then write to S3 in the Iceberg table format, then query with Athena: https://www.youtube.com/watch?v=2m6ehwmzOnc
Contribute:
- Review the code (focus on IcebergSink.cpp/IcebergSource.cpp)
- Test with your Iceberg setup and share feedback
- Help tackle the "What’s next" list!
Tech notes:
- Built on Apache Arrow C++ for Parquet/ORC file handling.
- Minimal runtime dependencies (no Hadoop/JVM).
- AWS SDK integration for Glue/S3 auth.
Note: Starting from preview3, the syntax for catalog configuration is changed from ENGINE to SETTINGS.