What is the Open Source Alternative to CockroachDB?

I’ve been hearing this question more frequently since Cockroach Labs further restricted its already limited license. Now, any company making over $10 million in annual revenue has to pay. This marks the end of CockroachDB’s open-source story. Given the database’s popularity, many have started searching for an open-source alternative to this distributed SQL database.

If you’re reading this, chances are high you’ve been affected by the license change. Now, you can relax and breath out — an alternative does exist. It’s fully open under the Apache 2.0 license and goes by the name YugabyteDB. Let me give you a quick tour to help you understand the similarities and differences between these two distributed databases.

What’s the Similarity?

Both CockroachDB and YugabyteDB belong to the class of distributed SQL databases. These databases are designed to scale your read/write workloads across a cluster of interconnected nodes, tolerating all sorts of possible outages from minor server-level glitches to major region-level incidents in the cloud. Unlike NoSQL databases, they do this without compromising data consistency.

Like CockroachDB, YugabyteDB is typically deployed in a configuration with three or more nodes. Below is an example of a three-node YugabyteDB cluster configured with a replication factor of three, meaning there are three copies of your data across the cluster.

Press enter or click to view image in full size

The entire dataset is split into shards, and every record you store in the database is mapped to one of these shards. In the example above, there are three shards. With a replication factor of three, each shard has a primary copy (Raft leader) and two backup copies (Raft followers).

The Raft leaders serve two key purposes. First, they handle all read and write requests for the data mapped to their respective shards. Second, they replicate changes to the followers. YugabyteDB distributes Raft leaders of different shards across different database nodes, enabling even distribution of data and read/write workloads across the cluster.

If you take a look at the diagram one more time, you’ll not spot any component that could cause a bottleneck. Neither CockroachDB nor YugabyteDB has a master/coordinator/proxy/director node that receives application requests and then distributes them across the cluster. Distributed SQL databases don’t need that component by design because all the nodes communicate with each other directly to coordinate the execution of distributed queries and transactions, replicate changes, and gracefully handle outages.

This is what makes CockroachDB and YugabyteDB comparable. The next question you might ask is whether there’s anything unique about YugabyteDB apart from its Apache 2.0 license.

What’s the Difference?

If you start comparing the databases granularly at the feature level, you’ll obviously find some differences here and there, but most of those differences won’t be subtle.

A truly big difference is Postgres. Both databases claim to be Postgres-compatible, but their levels of compatibility are not the same, and they approached compatibility differently.

CockroachDB built its SQL engine from scratch and started adding Postgres capabilities gradually over time. YugabyteDB went in the other direction by taking the Postgres source code as-is and replacing the Postgres storage layer with YugabyteDB’s own distributed storage architecture.

The results of these two approaches are significant. CockroachDB achieved wire-level compatibility with Postgres and supported a subset of core Postgres features. YugabyteDB became not only feature-compatible but also runtime-compatible with Postgres. Most of the applications, libraries, tools, and frameworks designed for Postgres see no difference between Postgres and YugabyteDB. YugabyteDB is just like Postgres for them, only fully distributed.

How is it possible to achieve that level of compatibility? The short answer is that when you connect to YugabyteDB, you literally connect to Postgres and let Postgres execute your application requests over a distributed cluster.

Let’s break this down into more detail. When you start a Postgres instance, the database starts the Postgres postmaster process, which listens for incoming client connections.

Press enter or click to view image in full size

When a client or application connects, the postmaster forks itself into a Postgres backend process to coordinate that client’s requests. That backend parses and executes queries, works with the Postgres shared memory and files to read and write data, and performs other processing before returning results to the client.

When you start a YugabyteDB node and open up a client connection with it, you’ll be connecting to the same familiar Postgres postmaster.

Press enter or click to view image in full size

Just like in Postgres, the postmaster forks a new Postgres backend for each new client connection, and the backend parses and executes the requests. Only this time, the requests are executed across a distributed YugabyteDB storage layer comprised of tserver instances/processes.

Above is a simplified architecture diagram of a single YugabyteDB node. Each YugabyteDB node has its own instances of Postgres postmaster and tserver running.

Overall, Postgres is what makes YugabyteDB different from CockroachDB. You can think of YugabyteDB as a distributed version of Postgres.

If YugabyteDB sounds like what you’re looking for, then head over to their GitHub to learn more about the database and choose a deployment option that works best for you.

If you have Docker, you can start an instance of YugabyteDB in under a minute:

mkdir ~/yugabyte-volumedocker network create custom-network
docker run -d --name yugabytedb_node1 --net custom-network \
  -p 15433:15433 -p 7001:7000 -p 9000:9000 -p 5433:5433 \
  -v ~/yb_docker_data/node1:/home/yugabyte/yb_data --restart unless-stopped \
  yugabytedb/yugabyte:latest \
  bin/yugabyted start \
  --base_dir=/home/yugabyte/yb_data --daemon=false

And then connect to it with the ysqlsh tool:

docker exec -it yugabytedb-node1 bin/ysqlsh -h yugabytedb-node1

The prompt will welcome you to execute your first request:

ysqlsh (11.2-YB-2.19.3.0-b0)
Type "help" for help.yugabyte=#

Enjoy the freedom of open source!