DBaaS in 2024: Which PostgreSQL operator for Kubernetes to select for your platform?

Links:

My talk on KCZ Czech and Slovak 2024
https://github.com/depeche-io/dbaas-postgresql
Part 2 — Bitnami Helm Chart, DIY solutions as Baseline
Part 3— Zalando Operator, Cybertec PG-operator
Part 4— Postgres Operator from CrunchyData, Percona Operator
Part 5 — CloudNative PG formerly from EBD (Coming soon…)
Part 6 — StackGres
Conclusion (Coming soon…)

As a part of my consulting, it has just happened that I found myself several times in a discussion “how to operate PostgreSQL correctly in Kubernetes”? I already have seen several operators in production, but as this field is growing rapidly, I’ll try my best to sum up my findings and hands-on experience to help you select along with some trade-off thoughts. I am not affiliated with any of the providers except for using some of these in production or talking to some of the authors at conferences. I will structure this as a mini-series, so if you would like to see TL;DR which to select, please hang on a few weeks. (But I doubt there will be an “all-round winner”, we’ll see.)

Press enter or click to view image in full size

Should you really provide Database-as-a-service in-house for your internal development platform?

Providing always working highly-available secure and up-to-date PostgreSQL that has stable performance and is scalable is just hard. Let’s dig into some of the requirements that you typically have when you would like to include this as a part of your platform:

Operator Features:

Operator supports at least capability level 4 (In 2024, I still don’t believe operators for PG to go full “auto-pilot” — my apologies)
Basic security practices are enforced or at least recommended (in-transit TLS for app connection, app user vs. Postgres separation, network policy…)
Can I easily use non-contrib Postgres extensions (for example pgvector)
Connection pooler integration (and k8s kind: Service approach)
Observability support (OpenTelemetry)

Self-service:

DB is provided as a service to the developers (with proper permissions, the developer can spin up, tear down, or modify Postgres cluster in a fully automated fashion) — no JIRA ticketing, no organizational gating, etc.
No PostgreSQL expertise should be required for development setups. (This might include sensible defaults — “autopilot”.) “Go run my microservice DB, just do it.” Optionally — here is my .pgdump file that the service should restore from S3.
Developers should understand the DB cluster status and have up-to-date information on what is roughly happening during the provisioning actions (don’t take this for granted… there is a big gap in how operators signal “desired” vs. real cluster status)
Developers can easily create a new DB Cluster from backup. Cloud backup and restore are supported (typically to S3-like datastore). Bonus: Developers can clone existing DB clusters.
(Web UI to control — some folks might like it, but in my experience when building platform, you typically want to have something like a Backstage and Crossplane when providing more than just a single service to offer more “integrated” experience.)

Production-ready:

What is the operator adoption?
Fine-tuning production PostgreSQL must be possible (directly opposing “autopilot” point, DBA should have enough docs, how the operator works under the hood, how it handles edge cases, and how can be tuned), typically performance and security knobs are meant here.
The same or very similar DB cluster configuration can be used both for production and for development
The operator MUST NOT lose data and SHOULD recover automatically from most failure scenarios (this will probably be the most fun in the series).
Performance for the specific configuration should be predictable from the developer’s point of view (Please note, that typically “TB-scale” Postgres is typically not run as DBaaS (at least for most companies — yet?))
(Major version upgrade support in the operator?)
Price perspective and autoscaling (including scaling to 0 outside developer working hours)
Is there a failsafe, so even in case of a mistake in the configuration change I won’t lose my data in any scenario?

At first, you must understand, that the developers will inevitably compare your solution to their favorite managed solution from the cloud provider. My recommendation is — to be VERY mindful of your goal and what resources (expertise) you have. If you answer “no” to any of the following, maybe try to reconsider building your own in-house DBaaS.

Is it feasible and preferable for my developers to use remote Postgres for their day-to-day development?
Isn’t there some formal / org. chart approval that must be met before the resources are created, so we can’t deliver a fully automated experience?
Will we use the same operator and very similar configuration options for both the development and the production?
Are we very confident with our Kubernetes expertise? Do we already have deep experience running stateless workloads for 2+ years? Do we understand the performance, disaster recovery, and backup-restore of the PVCs of our k8s provider?
Do we have enough experience to setup and maintain H-A Postgres outside k8s? Can you do basic DBA operations like backup and restore with your favorite tools? (this is highly controversial “if you should have” this expertise or if the operator should abstract this as much as possible — my experience — up to a certain level, you absolutely need to understand the basics — how replication in Postgres can work because you will need to decide on which action to take during the failure. Another option is to have well-tested backups and be business-comfortable with some data loss) I believe that for any non-trivial production workload there must be some Postgres expertise either in-house or externally provided. The operator will help you a lot but when you edit the configuration YAML as usual and suddenly it won’t work, what will you do?
Do we want to build our Postgres DBaaS for several years to come and keep investing there along our main products? Does this bring or enough benefit among other alternatives?

If you don’t feel confident IMHO it’s very wise to go with the “traditional approach”:

each developer has his/her Postgres (OS-level install or docker)
VM-based Postgres installation on production

or use managed service:

Cloud providers: AWS RDS, GCP Cloud SQL, Azure Database, …
Traditional: EDB, Cybertec, Crunchy Data
Serverless: AWS Aurora, Neon
With Timescale support https://www.timescale.com/
Database branching (thin cloning) https://postgres.ai/
… (many, many more)

Also please don’t fall into the “we will do this cheaper than anyone else” rabbit hole. Simply put — you won’t. Most of the work is probably is not with the initial development setups but when you reach some scale at production, then all operator problems, missing in-house expertise, might hit you hard.

Let’s now discuss the basic aspects that all the operators will share

Developers control the operator via CRDs (custom resource definitions) — this is a good thing. Resources are namespaced, so regular k8s permissions can be used. You can also use some policies (GateKeeper and friends) to control each CRD attribute in a more fine-grained manner. You can also limit resources or POD CPU+RAM consumption per namespace via kind: ResourceQuota. Typically all these levels of control were sufficient in my experience.
Some offer CLI tools or kubectl plugins for either imperative-style of management or some actions (debugging, switch-over, …).
Feedback to the developers about the provisioning operations (via .status in CRD, via Events in the CRD, via logs either directly in the CRD or in the pods, none except for the direct Postgres log, …).

Operator Families

(non-operator) Single-instance / Static installation — designed to run only the primary, very limited cluster support included out-of-the-box (not exactly the operator, but the base Postgres image and many derivates — Bitnami Helm Chart). They don’t provide additional management value as operators, but they are simple and production-ready. If you don’t require DB Cluster for small applications without H-A requirements — you can use them.
StatefulSet — first generation depending on the Kubernetes primitive without Patroni
Patroni — second generation utilizing proven H-A solution outside of Kubernetes
“CloudNative” operator families — third generation where the H-A logic is integrated inside the operator itself

Here, we face several general problems not directly related to Postgres, but stateful workloads in general. Kubernetes offers only very “dumb” primitives like DaemonSet, Deployment, StatefulSet. StatefulSet of course was originally designed to handle stateful workloads with some identity around PersistentVolume, BUT the way it scales (+1) is very limiting for many use cases. Simple example — you have pg-0 primary Pod, pg-1 corrupted replica, pg-2 healthy replica. What to do not? If you lower StatefulSet size, you kill the healthy replica instead of the corrupted one. So the new marketing term “CloudNative” operator was born promising that it includes the logic inside, but as we all know all H-A solutions need years of production testing to be trustworthy and battle-proven.

Another aspect is how many layers of logic there are — this is tightly coupled with some misunderstanding when the containers started to be used massively — if the application itself should be directly run under the container as pid=1 (main process) or if there should be some supervisorctl (or patroni) daemon running all the time capable of handling restarts etc.

The problem with the middle layer (of any kind) is that you introduce another layer of complexity and translation between the Postgres process state and container health-checks. So the Kubernetes (or any orchestrator) acts on some potentially out-of-date or incorrect state. On the other hand, the middle layer can do for example pg_upgrade for the next major version (instead of using some one-off kind: Job container without it). In my experience it is much preferable not to use the middle layer because of the better failure scenarios of the Pod.

Evaluation Approach

I plan to present for each operator the following use cases:

Dev env

Single node cluster, backup, restore

Staging/Production env

H-A Cluster (primary and one or more read replicas)
synchronous_commit = on / off
Scaling read-replicas
Simulated failures and recovery
Minor version upgrade, Major version upgrade
Backup, Restore, WAL archiving, PITR
Manual switch over, basic management tasks (ex. upgrading k8s nodes, resizing cluster resources, volume expansion…)

and with the help of docs evaluate previous scenarios. I plan to script most of the steps, so they are reproducible.

Motivation

There are currently too many resources of basic “100 level” how-to run, setup basic operator, and run Postgres. This is something that you can probably complete in 10 minutes without much added value. There is IMHO little about 2-nd day operations that come unbiased outside of the large Postgres companies. This field is even more difficult when you try to follow which company has forked which original operator and understand what is the difference.

I see a decent amount of skepticism that many DBAs have about the motives to transfer to Kubernetes in general. This series might be a small piece of the puzzle to gain better insights from a person who is also quite skeptical and who doesn’t like having downtime or losing data.

I also enjoy this topic and when I went over the operators, there were many blank points where I don’t understand how they would behave.

What won’t be covered

I aim only to compare apples with apples, so I won’t dig into the many areas that are more container or Postgres-related and won’t differentiate operators among themselves.

Possible architectures for how to run Postgres on nodes— this is a must-read article in this area.
Containerised Postgres performance — many resources are proving, that you can get to almost as fast containerized performance as on bare metal.
RTO / RPO — as they are typically inherited from the underlying tools
Only “vanilla” Postgres without any “super-extension” is considered (no TimeScale, no Citus and sharding …)

Part 2 — Bitnami Helm Chart, DIY solutions as Baseline