Press enter or click to view image in full size
For as long as I’ve worked near databases, someone has been pitching a magical engine that does both OLTP and OLAP. One system. One copy of the data. No more Kafka pipelines duct-taping transactional databases to analytical warehouses. Just one elegant setup where real-time apps and dashboards play nicely together.
That dream has a name: HTAP (Hybrid Transactional/Analytical Processing). It’s been around since at least the SAP HANA days, and the pitch hasn’t really changed. What has changed, over and over again, is the execution.
And now… Databricks is taking another swing at it.
A few days ago, they quietly announced Lakebase, a Postgres-powered, serverless OLTP layer embedded inside the Databricks lakehouse. It’s not just a standalone database. It syncs from Delta tables, integrates with Unity Catalog, and is clearly aimed at AI-era apps that need fast, transactional access to structured data. Think: app state, feature serving, or powering your RAG agent with fresh data.
And honestly, it’s a pretty clever move. Databricks is betting that if they can bring the OLTP piece into the lakehouse rather than bolting analytics onto a database, they can flip the HTAP architecture on its head.
So this post is a bit of a look-back, look-around, and look-ahead.
Here’s what I’ll dig into:
- How we got here: the rise (and faceplants) of HTAP systems
- The key players and how they’ve approached the problem
- Why none of them has fully replaced the OLTP + OLAP combo
- And where Lakebase fits in (or deviates) from those past attempts
Spoiler: unifying OLTP and OLAP is still hard. But maybe it’s not about unification anymore.
HTAP: One System to Rule Them All (Until You Actually Build It)
Press enter or click to view image in full size
So what even is HTAP, really?
At its core, HTAP is short for Hybrid Transactional and Analytical Processing: a fancy way of saying: let me run my app and my dashboards on the same damn database. No syncing, no duplication, no “batch job that backfills the warehouse every four hours, unless it fails silently.” Just one system that does it all.
The ideal HTAP setup would mean:
- You don’t need to move data anywhere: ETL is gone, not just abstracted.
- You get real-time insights without having to stream data into some other system first.
- Your app backend and your data team are literally querying the same source of truth.
In theory, this is beautiful. In practice? Yikes.
Here’s the rub: OLTP and OLAP workloads are designed to be opposites.
OLTP (Online Transaction Processing)
- Think: your production app, user profiles, inventory, account balances.
- Optimized for lots of small, fast writes and highly selective reads.
- Wants normalized schemas and row-based access.
OLAP (Online Analytical Processing)
- Think: business dashboards, cohort analysis, ML pipelines.
- Optimized for big, sweeping reads over millions or billions of rows.
- Loves denormalized data and columnar storage.
Trying to make one engine do both is like asking a dirt bike to double as a freight train. You can try to engineer around the physics, but eventually you’re gonna hit a wall (or a bottleneck).
And this is why so many HTAP attempts have felt like they almost get there, but when pushed to real production scale, they start to sweat. Either the OLTP performance drops off, or the analytical queries take too long, or both sides end up compromising so much that neither works great.
Still… the dream lives on. And it’s why every couple of years, someone announces a new “true HTAP” system. Some of them are fascinating. Some are frustrating. Most are both.
Next up: a rundown of the ones that tried.
A Quick Tour Through HTAP History (aka: The Graveyard of Great Ideas)
HTAP isn’t new. Every five years or so, someone rediscovers the idea, slaps it on a new engine, and swears this time it’ll actually work. So before diving into the shiny new Lakebase stuff, let’s rewind and look at the previous waves of “unified OLTP+OLAP” dreams.
Wave 1: In-Memory Everything
This was the “RAM is the future” era. Vendors tried to solve the OLTP vs OLAP dilemma by… avoiding disk entirely.
- SAP HANA was probably the most aggressive. It loaded everything into memory and ran OLTP and OLAP on the same tables using a column-store underneath. Pretty wild for its time.
- Oracle In-Memory and SQL Server Columnstore Indexes followed similar lines: bolt analytics onto your transactional system by adding a second internal representation of the data (often columnar).
These were technically impressive, but:
- Memory is expensive.
- You still had to think hard about indexing, partitioning, and write amplification.
- And god help you if your dataset didn’t fit in RAM.
Cool for finance and telcos. Overkill for 99% of startups.
Wave 2: Cloud-Native Contenders
Then came the “scale-out is king” crew. Distributed HTAP for the cloud age.
- SingleStore (née MemSQL) built a custom engine that handled row and column storage in one binary. I actually like a lot about it, it’s fast and feels ergonomic. But it’s proprietary and not exactly the Postgres you know and love.
- PolarDB-X from Alibaba is a MySQL-compatible, sharded OLTP/OLAP blend with MPP under the hood. Think Aurora plus Snowflake, kind of. It’s super powerful but also super complex and mostly adopted in China.
- TiDB from PingCAP went the open-source route. It gives you a MySQL interface, writes to a distributed key-value store (TiKV), and lets you offload analytics to a columnar replica called TiFlash.
These were all more practical than the in-memory wave, and they brought scale + durability. But each came with its own tradeoffs:
- You had to operate them (and tune a lot).
- They weren’t quite as fast at transactions as Postgres or MySQL.
- And they never really reached critical mass in the developer ecosystem.
Wave 3: HTAP Meets the Lakehouse and AI
This is where we are now: the third wave. Less about OLAP engines trying to do OLTP, and more about data platforms trying to fold in OLTP for operational and AI use cases.
Press enter or click to view image in full size
- Snowflake Hybrid Tables let you write fast row-level mutations and do analytical queries in the same table (kind of). They’re new, but promising.
- AlloyDB from Google is a souped-up Postgres with columnar execution and vector search built-in. It’s basically “Postgres for ML teams”.
- And now: Databricks Lakebase: full Postgres, decoupled storage, syncing with Delta Lake, and integrated with Unity Catalog.
What’s interesting here is the shift in perspective. These platforms aren’t trying to be databases in the traditional sense. They’re trying to be foundations for apps, dashboards, and ML pipelines, all in one governed environment.
It’s not “HTAP or bust” anymore. It’s: how much OLTP do you need to bring into your lakehouse before it’s enough for modern apps?
Which brings us to the big question: how do all these approaches actually compare? Who nailed it? Who limped away?
HTAP Bake-Off: 6 Approaches, 6 Different Tradeoffs
Alright, time for the meaty part. Let’s put these HTAP attempts side-by-side and see how they actually stack up, not just on paper, but in practice.
What follows is a breakdown of how each platform approaches HTAP, what makes it compelling, and (crucially) where it starts to wobble under real-world pressure.
Press enter or click to view image in full size
So… Which One “Wins”?
That’s the thing, none of them do. At least not in the “this replaces all your databases forever” kind of way.
Every system above makes a set of tradeoffs:
- You either compromise on true transactional fidelity, or
- You hit limitations on query flexibility, or
- You’re locked into a closed ecosystem that doesn’t play nice with your stack.
The best choice depends on what you care about:
- Want Postgres semantics and lakehouse harmony? Lakebase.
- Need low-latency queries with a bit of OLTP? Snowflake Hybrid or AlloyDB.
- Building an app with real-time ingest + analytics? SingleStore or TiDB might shine.
- Doing pure OLAP but curious about lightweight transactions? ClickHouse.
But none of them (not even the shiniest ones) have nailed the “drop-in replacement for Postgres and Snowflake” pitch. Not yet, anyway.
Still, something interesting is happening: the center of gravity is shifting toward modular HTAP, where OLTP and OLAP live in harmony, even if not inside the same executable.
Lakebase Isn’t a Revolution. It’s a Strategic Merge
The story behind Databricks Lakebase actually starts earlier, with Databricks quietly acquiring Neon, the team building a cloud-native, decoupled Postgres built on Rust. Neon was doing some genuinely cool stuff: separating compute and storage at the page level, point-in-time recovery, zero-copy branching, and basically rethinking what Postgres could be like in a serverless world.
Press enter or click to view image in full size
Lakebase is, more or less, what happens when you plug that kind of Postgres into the Databricks ecosystem.
It’s not a moonshot HTAP system. It’s not a unified query engine that spans rows and columns and fuses everything together. Instead, it feels more like a well-calculated infrastructure play: use Neon’s tech to give Databricks a transactional layer, and wire it up with the rest of the lakehouse stack.
Serverless Postgres that Just Works
Because of the Neon DNA, Lakebase isn’t just “we run Postgres for you.” It’s decoupled, auto-scaling, and stateful in a cloud-native way.
- You don’t manage VMs or nodes.
- You can recover to any previous point in time.
- And you get the ergonomics of Postgres with the elastic behavior you’d expect from Databricks.
This gives it a leg up over traditional RDS-style Postgres instances, and puts it more in line with AlloyDB or CockroachCloud. But with fewer moving parts, because you’re not provisioning replicas, you’re just writing to Postgres in the lakehouse UI.
Syncing from Delta, not the other way around
The integration goes beyond Postgres: Lakebase lets you sync from Delta tables into Postgres, turning analytical outputs into app-servable records.
If you’ve got a pipeline that writes predictions, features, or dashboards into Delta, Lakebase lets you reflect those in a transactional context, all within Unity Catalog governance.
That’s neat. But it’s also a one-way bridge (at least right now). There’s no magical capture of Postgres writes back into Delta. So you’re still thinking in terms of “OLTP for apps, OLAP for analysis,” just with smoother wiring between them.
The AI Agent Angle
Postgres plus pgvector plus Unity Catalog access = a surprisingly competent backend for AI workloads. Databricks is clearly leaning into this:
- Store model outputs and features in Delta
- Sync those into Lakebase
- Use Lakebase as a low-latency serving layer for RAG agents or dashboards
Again, it’s not built to replace Pinecone or Faiss at scale, but for many teams, “good enough + simple + integrated” beats exotic and expensive any day.
Pragmatism over Purity
To be clear: this is not the HTAP endgame. There’s no unified optimizer. No cross-engine joins. You’re still writing to one system and querying another.
But it is a more honest model:
- A transactional layer that respects OLTP constraints
- An analytical engine optimized for scans
- A clear, managed integration path between the two
In a space filled with over-promises and fragile monoliths, Lakebase is refreshing in how grounded it is.
TL;DR
Lakebase is the product of Neon’s smart Postgres architecture meeting Databricks’ lakehouse vision. It doesn’t collapse OLTP and OLAP into one thing, it lines them up neatly and gives you tooling to bridge them.
If you’re already using Databricks, it might be the simplest way to get app-grade OLTP without reaching for yet another cloud service. If you’re not? It’s still a signal that modular HTAP is the direction this industry is heading.
HTAP’s New Favorite Pattern: Decouple Now, Sync Later
It’s becoming pretty clear that the “one engine to rule them all” HTAP vision is… not aging well. The newer playbooks ( Lakebase included) are betting on decoupling instead of cramming both workloads into a single system.
Instead of forcing OLTP and OLAP to coexist in the same execution planner, you get:
- A transactional system that’s good at fast, small writes (Postgres, usually)
- An analytical engine that crushes scans (Delta Lake, or something columnar)
- And a native sync layer that glues them together without you writing scripts at 2am
It’s less sexy than a unified HTAP engine, but way easier to reason about.
The upside:
- Best-of-breed tools for each job
- Cleaner boundaries between serving and analytics
- Easier scaling: no dual-purpose bottlenecks
The catch?
- You’re working with eventual consistency
- Syncing adds latency
- You still need to think about data ownership and freshness windows
But hey, compared to the full rewrites some HTAP systems ask for, this is refreshingly pragmatic.
What’s Next for HTAP? A Few Guesses
Here’s where I think the puck is sliding:
- HTAP for agents and embeddings: not just dashboards. Databases need to speak vector, and Postgres extensions like pgvector are giving them a way in.
- Postgres and MySQL-based HTAP: devs are tired of “HTAP-compatible” engines. Familiar, boring OLTP cores with smart analytical overlays will win.
- Serverless as the default: no more managing clusters, period. If your HTAP engine isn’t elastic, it’s getting left behind.
- Lakehouse integration over monoliths: rather than replace your stack, HTAP tooling will plug into what you already have: catalogs, access control, lineage.
The age of the HTAP wunderkind startup may be fading. But the era of boring, usable HTAP? Just beginning.
Wrapping It Up: HTAP Isn’t Dead, It’s Just Boring Now (That’s a Compliment)
We’ve spent over a decade chasing the HTAP dream, and we still haven’t built The One Engine. But maybe that’s okay.
What’s emerging instead is a quieter, more modular story:
- App-facing OLTP here
- Scalable OLAP over there
- Managed glue to sync them
- And AI agents poking both ends of the pipeline
Lakebase doesn’t break the mold, but it plays the new game well. It’s not trying to unify everything. It’s just trying to make the OLTP + OLAP combo a little less painful, especially if you’re already swimming in the Databricks ecosystem.
The next generation of HTAP won’t come as one big binary. It’ll show up as a stack of smaller, smarter pieces that click together cleanly.
And honestly? That’s more useful than another moonshot engine.