A look at Aurora DSQL’s architecture

3 min read Original article ↗

Hari Krishna Sunder

Amazon has now entered the elite club of distributed databases with the announcement of DSQL!

What is DSQL?

DSQL is a serverless distributed SQL database for online transaction processing (OLTP). It’s Postgres compatible (with limitations) with atomic clocks. It’s sweet spot is new geo-distributed apps that require ACID compliance and low latency reads.

DSQL architecture

The architecture of DSQL is explained here

This diagram shows it in all it’s glory

Press enter or click to view image in full size

Aurora DSQL architecture

But, wait! This does not look like a database. Whats this Adjudicator and Journal. And where is my WAL? How does this give me access to a table? Even our typical distributed database building block, the shard is missing!

DSQL architecture simplified

I shall try to simplify this and replace the component names with more standard ones like Shards, WALs, and Leaders.

NOTE: This is a very crude simplification. Remember, the DSQL team has picked different names for good reasons. This is just a way to map it to existing systems to help make it a little bit easier to understand.

Press enter or click to view image in full size

Aurora DSQL architecture simplified

Now this looks more similar to Spanner, CockrachDB, YugabyteDB,...

  • Router/Load Balancer: Authentication and routing of new connections to a QP.
  • QP: Query Processors per region sitting close to the app.
  • Adjudicator/Shard Leader: Each of these is responsible for ensuring the consistency of a portion of the database. No actual data is stored here.
  • Journal/WAL(s): The god of durability and high availability.
  • Storage/Shard follower: This is where the data actually resides.

DSQL has split up the shard into its fundamental building blocks. This makes it a microservice-based distributed system. Each component can scale independently of the other, and they are all highly available. Depending on the size of the data, its rate of change, and the percentage of data that is changed, they can scale up or down individual components. This is what makes it serverless.

Query Processors can be spun up and down as needed, which makes connection limits a thing of the past.

The reads go straight to Storage (Shard followers) and skip locks, which gives them very low latency. The tradeoff is that it limits the database to only Snapshot Isolation with fail-on-conflict behavior. The apps have to retry the writes on these failures and model the schema to avoid hot spots.

When you write, you need to ensure concurrent inserts, updates, and deletes to the same row get synchronized. For this, the writes get sent to Adjudicators, each of which is assigned one part of the database. If in one transaction you modify multiple rows that are served by multiple Adjudicators, then they perform something like a 2-phase commit across them.

Once the write has been approved, it gets persisted in the Journal. And from there they get applied to Storage, completing the cycle.

One issue with reading from a follower is that it can only provide stale data since it takes time for the change to get there from the leader. DSQL circumvents this by using a combination of atomic clocks, stable networks and incurring the penalty of the delay on writes. The exact details of this have not yet been fully disclosed yet.

As a distributed system, this is a very unique architecture and definitely a very cool one! There just isn't anything else like it. But is this a good architecture for an OLTP database that is trying to replace Postgres?