A Kubernetes operator for managing SurrealDB clusters, built with KubeOps (.NET 10).
Overview
The SurrealDB Operator manages the full lifecycle of SurrealDB clusters on Kubernetes: deployment, storage backend provisioning (TiKV, RocksDB, or in-memory), scaling, TLS via cert-manager, and automated backups via Velero.
┌──────────────────────────────────────────────────────────┐
│ Kubernetes Cluster │
│ │
│ ┌───────────────────────┐ ┌────────────────────────┐ │
│ │ SurrealDB Operator │ │ TiDB Operator (ext) │ │
│ │ ┌─────────────────┐ │ │ Manages TiKV/PD │ │
│ │ │ Cluster Ctrl │──┼───► StatefulSets │ │
│ │ ├─────────────────┤ │ └────────────────────────┘ │
│ │ │ Backup Ctrl │──┼──► │
│ │ └─────────────────┘ │ ┌────────────────────────┐ │
│ └───────────────────────┘ │ Velero (ext) │ │
│ │ Backup / Schedule / │ │
│ │ Restore / BSL │ │
│ ┌───────────┐ ┌───────────┐ └────────────────────────┘ │
│ │ SurrealDB │ │ SurrealDB │ │
│ │ Pod │ │ Pod │ tikv://pd:2379 │
│ └─────┬─────┘ └─────┬─────┘ │
│ └──────┬───────┘ │
│ ┌────────────▼───────────────────────┐ │
│ │ PD StatefulSet + TiKV StatefulSet │ │
│ └────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────┘
Features
- Declarative SurrealDB clusters — one CR manages Deployment, Service, and ServiceAccount
- Multiple storage backends —
memory(dev/test),tikv(production),rocksdb(single-node) - TiKV coordination — creates and manages
Cluster,PDGroup, andTiKVGroupCRs via TiDB Operator v2 - Horizontal scaling — update
spec.surrealdb.replicasto scale SurrealDB nodes - Automated backups — Velero-backed
BackupDestination/Backup/BackupSchedule/RestoreCRDs with GFS retention and live data round-trip - Ingress support — optional Ingress creation with TLS
- Health status — Kubernetes-standard conditions (
Ready,Synchronizing,Complete,DataVerified,Failed) - Admission webhooks — dependency pre-checks deny invalid resources at admission time (missing TiDB Operator, Velero, cert-manager, or secrets) with clear error messages
- Multi-arch container —
linux/amd64andlinux/arm64images published to GHCR
Prerequisites
| Component | Version | Required? |
|---|---|---|
| Kubernetes | 1.28+ | Yes |
| TiDB Operator | 2.0+ | For tikv backend |
| Velero | 1.18+ | For backups (Backup / BackupSchedule / Restore); install with just velero-install |
| cert-manager | 1.13+ | For managed TLS certificates (spec.tls.issuerRef) |
| just | 1.0+ | For development task runner |
Installation
1. Install CRDs
kubectl apply -f https://github.com/stevefan1999-personal/surrealdb-operator/releases/latest/download/surrealdbcluster-crd.yaml
Or from a local clone:
kubectl apply -f deploy/crds/surrealdbcluster-crd.yaml
2. Deploy the operator
# Development (single replica, minimal resources) kubectl apply -k deploy/kustomize/base # Production (2 replicas, higher resource limits) kubectl apply -k deploy/kustomize/overlays/production
3. Create a SurrealDbCluster
kubectl apply -f - <<'EOF' apiVersion: surrealdb.io/v1alpha1 kind: SurrealDbCluster metadata: name: my-cluster namespace: default spec: surrealdb: replicas: 1 image: repository: surrealdb/surrealdb tag: v3.0.5 port: 8000 logLevel: info auth: rootUsername: root rootPasswordSecretRef: name: surrealdb-root-password key: password storage: backend: memory service: type: ClusterIP port: 8000 EOF
Usage Examples
Production cluster with TiKV
apiVersion: surrealdb.io/v1alpha1 kind: SurrealDbCluster metadata: name: prod-cluster namespace: surrealdb spec: surrealdb: replicas: 3 image: repository: surrealdb/surrealdb tag: v3.0.5 logLevel: warn auth: rootUsername: root rootPasswordSecretRef: name: surrealdb-root-password key: password resources: requests: cpu: 500m memory: 512Mi limits: cpu: "2" memory: 2Gi storage: backend: tikv tikv: pdReplicas: 3 tikvReplicas: 3 pdStorage: storageClassName: fast-ssd size: 1Gi tikvStorage: storageClassName: fast-ssd size: 50Gi service: type: ClusterIP port: 8000
Backups are configured separately via BackupDestination and BackupSchedule CRDs — see
Backup System below.
Check cluster status
kubectl get surrealdbcluster my-cluster kubectl describe surrealdbcluster my-cluster
Configuration Reference
spec.surrealdb
| Field | Type | Default | Description |
|---|---|---|---|
replicas |
integer | 1 |
Number of SurrealDB pods |
image.repository |
string | surrealdb/surrealdb |
Container image repository |
image.tag |
string | v3.0.5 |
Container image tag |
image.pullPolicy |
string | IfNotPresent |
Image pull policy |
port |
integer | 8000 |
SurrealDB listen port |
logLevel |
string | info |
Log level (none/error/warn/info/debug/trace) |
auth.rootUsername |
string | root |
SurrealDB root username |
auth.rootPasswordSecretRef |
object | — | Secret reference for root password. Optional — if omitted, the operator auto-generates a {cluster}-auth Secret |
resources |
object | — | CPU/memory requests and limits |
Auto-generated root credentials
spec.surrealdb.auth.rootPasswordSecretRef is optional. When omitted, the operator
automatically creates a Secret named {cluster}-auth containing a cryptographically random
32-character password. This follows the same pattern used by CloudNativePG: credentials are
generated once on first reconcile and never rotated by the operator.
The auto-generated Secret has two keys:
| Key | Value |
|---|---|
username |
root |
password |
random 32-char alphanumeric string |
To retrieve the generated password:
kubectl get secret my-cluster-auth -o jsonpath='{.data.password}' | base64 -d
The secret name is also recorded in the cluster status for discovery:
kubectl get surrealdbcluster my-cluster -o jsonpath='{.status.auth.generatedSecretName}'spec.storage
| Field | Type | Default | Description |
|---|---|---|---|
backend |
string | memory |
Storage backend: memory, tikv, or rocksdb |
tikv.pdReplicas |
integer | 3 |
PD replicas (TiKV backend) |
tikv.tikvReplicas |
integer | 3 |
TiKV replicas |
tikv.pdStorage.size |
string | 1Gi |
PD persistent volume size |
tikv.tikvStorage.size |
string | 10Gi |
TiKV persistent volume size |
persistence.enabled |
boolean | false |
Enable PVC for rocksdb backend |
persistence.size |
string | 10Gi |
PVC size for rocksdb backend |
spec.service
| Field | Type | Default | Description |
|---|---|---|---|
type |
string | ClusterIP |
Kubernetes Service type: ClusterIP, NodePort, or LoadBalancer |
port |
integer | 8000 |
Service port (forwards to SurrealDB container port) |
Use ClusterIP for internal access (expose via Ingress or kubectl port-forward).
Use NodePort for dev/test clusters without a load-balancer controller.
Use LoadBalancer on cloud clusters with a cloud load-balancer provider.
spec.ingress
| Field | Type | Default | Description |
|---|---|---|---|
enabled |
boolean | false |
Create an Ingress resource |
className |
string | — | ingressClassName (e.g. nginx, traefik) |
hosts[*].host |
string | — | Virtual hostname |
hosts[*].paths[*].path |
string | / |
URL path prefix |
hosts[*].paths[*].pathType |
string | Prefix |
Path match type (Prefix or Exact) |
tls[*].secretName |
string | — | TLS Secret name for this host group |
tls[*].hosts |
array | — | Hostnames covered by the TLS certificate |
The operator updates the Ingress on every reconcile to reflect spec changes. Both Service and Ingress carry owner references and are garbage-collected when the cluster is deleted.
spec.tls
| Field | Type | Default | Description |
|---|---|---|---|
enabled |
boolean | false |
Enable TLS for the SurrealDB port |
issuerRef.name |
string | — | Name of the cert-manager Issuer or ClusterIssuer |
issuerRef.kind |
string | ClusterIssuer |
Issuer (namespaced) or ClusterIssuer (cluster-wide) |
secretName |
string | {cluster}-tls |
Secret containing tls.crt / tls.key. Auto-generated by cert-manager when issuerRef is set; pre-existing in BYO mode |
dnsNames |
array | [{cluster}-surrealdb.{ns}.svc.cluster.local] |
DNS SANs for the Certificate CR |
When issuerRef is set the operator creates a cert-manager Certificate CR owned by the cluster. When it is absent the operator expects the Secret to already exist (BYO mode).
When TLS is enabled, --web-crt and --web-key are automatically added to the SurrealDB start arguments and the TLS Secret is mounted at /etc/surrealdb/tls/ in the pod.
When both TLS and Ingress are enabled with a ClusterIssuer, the annotation cert-manager.io/cluster-issuer is automatically added to the Ingress.
TLS setup example
# 1. Install cert-manager (if not already installed) kubectl apply -f https://github.com/cert-manager/cert-manager/releases/latest/download/cert-manager.yaml # 2. Create a self-signed ClusterIssuer (for development / testing) kubectl apply -f - <<'EOF' apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: selfsigned-issuer spec: selfSigned: {} EOF # 3. Create a SurrealDbCluster with TLS enabled kubectl apply -f - <<'EOF' apiVersion: surrealdb.io/v1alpha1 kind: SurrealDbCluster metadata: name: my-cluster namespace: default spec: surrealdb: replicas: 1 storage: backend: memory tls: enabled: true issuerRef: name: selfsigned-issuer kind: ClusterIssuer EOF
Backup System
Backups are configured via three independent CRDs rather than spec.backup on the cluster:
BackupDestination — wraps a Velero BackupStorageLocation:
| Field | Type | Default | Description |
|---|---|---|---|
spec.provider |
string | aws |
Velero object-store provider (works for any S3-compatible endpoint) |
spec.bucket |
string | — | S3 bucket name (required) |
spec.prefix |
string | — | Optional key prefix inside the bucket |
spec.config |
map<string,string> | — | Provider config (e.g. region, s3ForcePathStyle, s3Url) |
spec.credentialSecretRef.name |
string | — | Secret containing the AWS-style credentials file |
spec.credentialSecretRef.key |
string | — | Key inside the Secret holding the credentials body |
spec.backupSyncPeriod |
string | 1m |
Velero BSL sync cadence (Go duration) |
Backup — a single backup instance:
| Field | Type | Description |
|---|---|---|
spec.clusterRef.name |
string | Name of the SurrealDbCluster to back up |
spec.destinationRef.name |
string | Name of the BackupDestination |
spec.expiresAt |
datetime | When this backup expires (null = keep forever) |
spec.retentionPolicy |
string | soft (GFS-managed) or hard (kept until expiresAt) |
status.phase |
string | Waiting → InProgress → Active → Expired | Failed |
BackupSchedule — cron-based trigger with GFS retention:
| Field | Type | Default | Description |
|---|---|---|---|
spec.clusterRef.name |
string | — | SurrealDbCluster to back up |
spec.destinationRef.name |
string | — | Target BackupDestination |
spec.schedule |
string | 0 2 * * * |
Cron expression (UTC) |
spec.retain.last |
integer | — | Always keep last N backups |
spec.retain.daily |
integer | — | Keep one per day for N days |
spec.retain.weekly |
integer | — | Keep one per week for N weeks |
spec.retain.monthly |
integer | — | Keep one per month for N months |
spec.retain.yearly |
integer | — | Keep one per year for N years |
spec.hardRetentionLimit |
integer | 50 |
Absolute max total Backup CRs; triggers GC |
See samples/deploy/examples/10-backup-destination-s3.yaml,
11-backup-manual.yaml, and
12-backup-schedule-multi-tier.yaml for complete examples.
Admission Webhooks
The operator registers four ValidatingAdmissionWebhook endpoints that run before any
resource is admitted to the cluster. This provides fast, clear feedback when a prerequisite
is missing rather than leaving a resource stuck in a failed reconcile loop.
| What you try to create | Missing dependency | Error you'll see |
|---|---|---|
SurrealDbCluster with storage.backend: tikv |
TiDB Operator v2 not installed | TiDB Operator v2 is not installed. Install tidb-operator v2 before using storage.backend=tikv. |
SurrealDbCluster with tls.issuerRef set |
cert-manager not installed | cert-manager is not installed. Install cert-manager before using spec.tls.issuerRef. |
Backup / BackupSchedule / Restore |
Velero not installed | Velero is not installed. Run 'just velero-install' before creating backup CRs. |
Backup |
BackupDestination not ready |
BackupDestination 'my-dest' is not ready. |
Backup with storage.backend: memory |
— | Cannot backup memory backend. |
BackupDestination |
Credential secret missing | Secret 'my-creds' not found in namespace 'default'. |
BackupDestination |
spec.bucket empty |
spec.bucket is required. |
Note: The webhooks require HTTPS. KubeOps generates a self-signed TLS certificate for the webhook server automatically — no cert-manager installation is needed for the webhook TLS itself.
Development
Prerequisites
- .NET 10 SDK
- just task runner
- Docker (for container publish)
- kubectl + a local Kubernetes cluster (k0s, kind, or minikube)
Available recipes
$ just --list
Available recipes:
build # Build the operator
clean # Clean build artifacts
demo endpoint username password # Run the demo console app
deploy # Deploy operator via kustomize base
deploy-crds # Apply CRD manifests to the cluster
deploy-local # Deploy operator via local overlay (for local development)
gen-crds # Generate CRD YAML via KubeOps CLI; keeps only our CRD, discards external stubs
lint # Run code formatting/analyzer lint checks
publish tag # Publish operator container image for a single arch (amd64)
publish-multiarch tag # Publish multi-arch operator image (amd64 + arm64 manifest)
restore # Restore NuGet packages
run # Run the operator locally against the k0s cluster
test # Run all tests (unit + integration + e2e)
test-e2e # Run backup/restore e2e tests against the local k0s cluster
test-integration # Run operator integration tests against the local k0s cluster
test-operator # Alias for test-integration
test-unit # Run all unit tests
Build and test
just restore # restore NuGet packages just lint # verify dotnet formatting/analyzers just build # compile just test-unit # unit tests (no cluster needed) just test # all tests including integration/e2e (requires k0s/k3s + dependencies)
Regenerate CRDs
After modifying SurrealDbCluster.cs schema fields, regenerate the CRD manifest:
This restores the KubeOps CLI tool, invokes dotnet kubeops generate crds, and copies the result to deploy/crds/surrealdbcluster-crd.yaml.
Publish container image locally
The container is published using dotnet publish with EnableSdkContainerSupport — no Dockerfile required.
Run against a cluster
just run # uses KUBECONFIG=/var/lib/k0s/pki/admin.confContributing
- Fork the repository
- Create a feature branch (
git checkout -b feat/my-feature) - Make changes following the conventions in CLAUDE.md
- Run lint and tests:
just lint && just test-unit - Commit with conventional commits (
feat:,fix:,refactor:, etc.) - Open a pull request
See ARCHITECTURE.md and DESIGN.md for architectural context.