Project Website · Documentation · Maximizing Free-Tier Storage
An S3-compatible orchestrator that combines multiple storage backends into a single unified endpoint. Add as many S3-compatible backends as you want — OCI Object Storage, Backblaze B2, AWS S3, MinIO, whatever — and the orchestrator presents them to clients as one or more virtual buckets. Per-backend quota enforcement lets you cap each backend at exactly the byte limit you choose, so you can stack multiple free-tier or cost-limited allocations from different providers into a single, larger storage target for backups, media, etc without worrying about surprise bills.
Multiple virtual buckets let different applications share the same orchestrator with isolated file namespaces and independent credentials. Each bucket's objects are stored with an internal key prefix ({bucket}/{key}), so bucket isolation requires zero changes to the storage layer or database schema.
Built-in cross-backend replication also makes this an easy way to keep your data in multiple clouds without touching your application. Point your app at the proxy, set a replication factor, and every object automatically lands in two or more providers — instant multi-cloud redundancy with zero client-side changes.
Objects are routed to backends based on the configured routing_strategy: pack (default) fills backends in config order, while spread places each write on the least-utilized backend by ratio. Metadata and quota tracking live in PostgreSQL; the backends only see standard S3 API calls. The orchestrator is fully S3-compatible and works with any standard S3 client.
Getting Started
Prerequisites: Go 1.26+, Docker, Make.
git clone https://github.com/afreidah/s3-orchestrator.git
cd s3-orchestrator
make runThis starts three MinIO backends via Docker Compose (the orchestrator uses embedded SQLite by default, so no external database is needed), then launches the orchestrator on localhost:9000. Test it:
aws --endpoint-url http://localhost:9000 s3 cp /etc/hostname s3://photos/test.txt aws --endpoint-url http://localhost:9000 s3 ls s3://photos/
Default credentials: access key photoskey, secret photossecret. Web dashboard at localhost:9000/ui/ (login: admin / admin).
See the Quickstart for full details, credentials for all buckets, and troubleshooting.
Other ways to install:
- Docker:
docker pull ghcr.io/afreidah/s3-orchestrator:<version> - Debian: download
.debfrom GitHub Releases - Binary: download from GitHub Releases
Database: SQLite is embedded by default — no external database needed for single-instance use. For multi-instance deployments, configure
database.driver: postgreswith PostgreSQL 14+. Runs3-orchestrator initto generate a config file interactively.
Verify artifact signatures:
Container images and release checksums are signed with cosign (keyless / Sigstore):
# Verify a container image cosign verify ghcr.io/afreidah/s3-orchestrator:<version> \ --certificate-identity-regexp='github\.com/afreidah/s3-orchestrator' \ --certificate-oidc-issuer='https://token.actions.githubusercontent.com' # Verify release checksums cosign verify-blob checksums.txt \ --signature checksums.txt.sig \ --certificate checksums.txt.pem \ --certificate-identity-regexp='github\.com/afreidah/s3-orchestrator' \ --certificate-oidc-issuer='https://token.actions.githubusercontent.com'
Operational CLI: s3-orchestrator admin --help for rebalance, drain, encryption management, and backend sync.
Table of Contents
- Getting Started
- Architecture
- S3 API Coverage
- Authentication & Multi-Bucket
- Degraded Mode (Database Circuit Breaker)
- Backend Circuit Breaker
- Write Routing
- Rebalancing
- Replication
- Over-Replication Cleanup
- Cleanup Queue
- PUT-before-COMMIT Pending Intents
- Lifecycle (Object Expiration)
- Orphan Reconciliation
- Encryption
- Object Data Cache
- Rate Limiting
- Usage Limits
- Configuration
- Configuration Hot-Reload
- Database
- Telemetry
- Webhook Notifications
- Web UI
- Endpoints
- Background Tasks
- Multi-Instance Deployment
- CLI Subcommands
- Development
- Deployment
- Project Structure
- Additional Documentation
Architecture
S3 clients (aws cli, rclone, etc.)
|
v
+-----------+
| S3 Orch. | <-- SigV4 auth, rate limiting, quota routing
+-----------+
| |
+--------+ +------------------+------------------+
v v v v
PostgreSQL OCI Object Backblaze B2 AWS S3
(metadata) Storage (20 GB) (10 GB) (5 GB)
\ | /
'------------ 35 GB total ---------'
- PostgreSQL stores object locations (
object_locations), per-backend quota counters and orphan bytes tracking (backend_quotas), and multipart upload state (multipart_uploads,multipart_parts). Schema is applied automatically on startup via goose versioned migrations embedded in the binary. All queries are generated by sqlc from annotated SQL files and executed via pgx/v5 connection pools. - Storage layer is split into three Go packages:
internal/store/coreholds engine-agnostic types, role interfaces, and orchestration helpers (the multi-step transactional operations likeRecordObject,PromotePending,MoveObjectLocation);internal/store/postgresandinternal/store/sqliteare thin per-engine adapters that implementcore.TxAdapterso the same orchestration code drives both engines. SQLite is the default for single-instance use; PostgreSQL is required for multi-instance deployments. - Backends are standard S3-compatible services accessed via AWS SDK v2, each with a dedicated tuned HTTP transport (connection pooling, idle timeout for DNS freshness). Streaming operations use a shared buffer pool to reduce GC pressure. Any provider that speaks the S3 API works -- OCI Object Storage, Backblaze B2, AWS S3, MinIO, Wasabi, etc.
- Write routing selects a backend for each new object based on the
routing_strategy. In pack mode (default), objects go to the first backend in config order that has available quota — good for filling free-tier allocations sequentially. In spread mode, objects go to the backend with the lowest utilization ratio ((bytes_used + orphan_bytes) / bytes_limit) — good for distributing load evenly across backends. Quota is updated atomically in a transaction alongside the object location record. Setquota_bytes: 0(or omit it) to disable quota enforcement on a backend — useful when you don't need cost control and just want unified access or replication. Backends with amax_object_sizelimit automatically skip objects that exceed the limit during write routing, rebalancing, and replication — preventing repeated 413 errors from providers with per-object size restrictions. - Usage limits optionally cap monthly API requests, egress bytes, and ingress bytes per backend. When a backend exceeds a limit, writes overflow to other backends and reads fail over to replicas. Delete and abort operations always bypass limits. Limits are enforced using cached database totals (refreshed at the configured flush interval) plus unflushed counters (in Redis when configured, otherwise local in-memory atomics). Adaptive flushing automatically shortens the interval when any backend approaches a limit.
S3 API Coverage
| Operation | Method | Path | Notes |
|---|---|---|---|
| ListBuckets | GET |
/ |
Returns buckets the credential has access to |
| HeadBucket | HEAD |
/{bucket} |
Confirms bucket exists (200 if authorized) |
| GetBucketLocation | GET |
/{bucket}?location |
Returns empty LocationConstraint |
| PutObject | PUT |
/{bucket}/{key} |
Preserves x-amz-meta-* user metadata |
| GetObject | GET |
/{bucket}/{key} |
Supports Range header; returns x-amz-meta-* |
| HeadObject | HEAD |
/{bucket}/{key} |
Returns x-amz-meta-* user metadata |
| DeleteObject | DELETE |
/{bucket}/{key} |
Idempotent (404 from store treated as success) |
| DeleteObjects | POST |
/{bucket}?delete |
Batch delete up to 1000 keys per request |
| CopyObject | PUT |
/{bucket}/{key} |
Uses X-Amz-Copy-Source header (same-bucket only) |
| ListObjectsV1 | GET |
/{bucket} |
Original list API, uses marker pagination |
| ListObjectsV2 | GET |
/{bucket}?list-type=2 |
Supports delimiter for virtual directories |
| ListMultipartUploads | GET |
/{bucket}?uploads |
Lists in-progress multipart uploads |
| CreateMultipartUpload | POST |
/{bucket}/{key}?uploads |
|
| UploadPart | PUT |
/{bucket}/{key}?partNumber=N&uploadId=X |
|
| CompleteMultipartUpload | POST |
/{bucket}/{key}?uploadId=X |
|
| AbortMultipartUpload | DELETE |
/{bucket}/{key}?uploadId=X |
|
| ListParts | GET |
/{bucket}/{key}?uploadId=X |
Batch delete (DeleteObjects) accepts an XML request body listing up to 1000 keys and returns per-key success/error results. Metadata removal is sequential (each key is its own DB transaction), while backend S3 deletes run concurrently with bounded parallelism for throughput. Failed backend deletes are enqueued to the cleanup queue for automatic retry. The response always returns HTTP 200, even when individual keys fail -- errors are reported per-key in the XML body. Quiet mode (<Quiet>true</Quiet>) suppresses the <Deleted> elements and only returns errors.
Each request must target a virtual bucket name that matches the credentials used to sign the request. Requests to a bucket the credentials aren't authorized for return 403 AccessDenied.
Every response includes an X-Amz-Request-Id header with a unique request ID for tracing. Clients can supply their own ID via X-Request-Id; otherwise the orchestrator generates one. The same ID appears in audit logs and OpenTelemetry spans.
Authentication & Multi-Bucket
Each virtual bucket has one or more credential sets. On every request, the orchestrator:
- Extracts the access key from the SigV4
Authorizationheader, presigned URL query parameters, or token fromX-Proxy-Token. - Looks up which bucket the credential belongs to.
- Verifies the signature (SigV4 header or presigned query parameters) or token.
- Validates the URL path bucket matches the authorized bucket.
Three auth methods are supported, checked in order:
- AWS SigV4 (recommended) - Standard AWS Signature Version 4 via the
Authorizationheader. Compatible withaws cli, SDKs, and any S3 client. Signature verification is constant-time: unknown access keys still compute a full HMAC to prevent timing side-channel enumeration. - Presigned URLs - SigV4 query-parameter authentication (
X-Amz-Algorithm,X-Amz-Credential, etc.) for time-limited, shareable URLs. Works with any AWS SDK presign client. Maximum expiry: 7 days. Uses the same bucket credentials as normal requests — no additional configuration required. - Legacy token - Simple
X-Proxy-Tokenheader for backward compatibility.
Multiple services can share a bucket by each having their own credentials that all map to the same bucket name. Access key IDs must be globally unique across all buckets.
Authentication is always required — every bucket must have at least one credential set.
For client usage examples (AWS CLI, rclone, boto3, Go SDK), see the User Guide. For deployment and operations, see the Admin Guide.
Degraded Mode (Database Circuit Breaker)
A three-state circuit breaker wraps all database access:
closed (healthy) → open (DB down) → half-open (probing) → closed
When the database becomes unreachable (consecutive failures exceed failure_threshold), the orchestrator enters degraded mode:
- Reads broadcast to all backends in order (or in parallel if
parallel_broadcastis enabled). A location cache (TTL configurable viacache_ttl) stores successful lookups to avoid repeated broadcasts for the same key. - Writes (PUT, DELETE, COPY, multipart) return
503 ServiceUnavailable. - Health endpoint returns
degradedinstead ofok.
After open_timeout elapses, the circuit enters half-open state and sends a single probe request. If the database responds, the circuit closes and normal operation resumes automatically.
Backend Circuit Breaker
Optional per-backend circuit breakers detect when individual S3 backends become unreachable (expired credentials, provider outage, network failure) and stop sending traffic to them until recovery is detected.
closed (healthy) → open (backend down) → half-open (probing) → closed
When a backend accumulates failure_threshold consecutive failures, the circuit opens:
- Writes skip the unhealthy backend and route to other backends with available quota.
- Reads fail over to replicas on healthy backends (requires
replication.factor >= 2). - Replication creates replacement copies on healthy backends after a sustained outage (see Health-Aware Replication).
- All calls to the backend return
ErrBackendUnavailableimmediately — no timeout waiting.
After open_timeout elapses (plus randomized jitter of up to open_timeout/4), the next organic request to the backend is allowed through as a probe. If it succeeds, the circuit closes. If it fails, the circuit reopens for another timeout period. The jitter is recomputed on each open transition to prevent multiple backends from probing simultaneously after a shared failure event.
A background watchdog service checks all circuit breakers every minute for stale half-open probes. If a probe has been in flight longer than 2 minutes (e.g. the backend accepted the TCP connection but never responded), the watchdog resets the circuit to open so a new probe can be dispatched. This prevents circuits from getting permanently stuck half-open on low-traffic backends where no new request arrives to trigger the passive stale-probe detection.
Unlike the database circuit breaker, backend circuit breakers treat all errors as failures (no error filtering). This is a per-backend wrapper — each backend has its own independent circuit breaker state.
backend_circuit_breaker: enabled: true failure_threshold: 5 # consecutive failures before opening (default: 5) open_timeout: "5m" # delay before probing recovery (default: 5m)
Disabled by default. Requires a restart to enable/disable (non-reloadable).
Write Routing
The routing_strategy setting controls how the orchestrator selects a backend for new objects (PutObject, CopyObject, CreateMultipartUpload):
- pack (default) — fills the first backend in config order until its quota is full, then overflows to the next. Good for maximizing usable capacity on free-tier providers where you want to fill one allocation before touching the next.
- spread — places each object on the backend with the lowest utilization ratio (
(bytes_used + orphan_bytes) / bytes_limit). Good for distributing storage evenly across backends to balance load and wear.
Both strategies respect quota limits — a backend with no remaining space is skipped regardless of strategy. When usage limits are configured, backends that have exceeded their monthly limits are also excluded from selection.
Rebalancing
The rebalancer periodically moves objects between backends to optimize storage distribution. Disabled by default to avoid unexpected egress charges.
Two strategies:
- pack - Fills backends in configuration order, consolidating free space on the last backend. Good for maximizing usable capacity on free-tier providers.
- spread - Equalizes utilization ratios across all backends. Good for distributing load evenly.
The threshold parameter (0–1) sets the minimum utilization spread required to trigger a rebalance run. Objects are moved in configurable batch sizes with bounded concurrency (concurrency setting, default 5) for throughput.
Replication
When replication.factor is greater than 1, a background worker creates additional copies of objects on different backends to reach the target factor. Read operations automatically fail over to replicas if the primary copy is unavailable.
The worker runs once at startup to catch up on pending replicas, then continues at the configured interval.
Health-Aware Replication
When backend circuit breakers are enabled, the replication worker is aware of backend health. If a backend's circuit breaker has been open longer than unhealthy_threshold (default: 10 minutes), the replicator treats copies on that backend as unavailable and creates replacement copies on healthy backends to maintain the target replication factor.
This prevents a sustained backend outage from silently reducing effective redundancy. For example, with factor: 2 and an object on backends A and B, if B goes down and stays down past the threshold, the replicator creates a third copy on backend C — restoring two accessible copies.
The threshold prevents replacement copies from being created during brief transient failures. The replicator also prefers healthy backends as copy sources and never selects a circuit-broken backend as a replication target.
When a backend recovers, the extra copies it created are cleaned up automatically by the over-replication cleaner (see Over-Replication Cleanup).
Over-Replication Cleanup
When a backend recovers after the replicator has already created replacement copies on other backends, objects end up with more copies than the replication factor. A background worker detects and removes the excess.
The cleaner scores each copy by its backend's health and storage utilization, then removes the lowest-scoring copies until the object reaches the target factor:
- Draining backend: score 0 (always removed first)
- Circuit-broken backend: score 1 (removed next)
- Healthy backend: score 2 + (1 − utilization ratio), range [2..3]
Among healthy backends, the most utilized backend gets the lowest score — freeing space where it is scarcest. Each object's copies are locked with FOR UPDATE to prevent races with concurrent replicator or rebalancer activity.
The worker runs at the replication.worker_interval and shares the same batch_size and concurrency settings. It only runs when replication.factor > 1. Like the replicator, it uses a PostgreSQL advisory lock for multi-instance coordination.
Cleanup can also be triggered on demand via the admin API (POST /admin/api/over-replication), the CLI (s3-orchestrator admin over-replication --execute), or the web dashboard's Clean Excess button.
Cleanup Queue
When a backend S3 operation succeeds but the subsequent metadata update or cleanup deletion fails, an orphaned object is left on the backend — invisible to the system, consuming storage but not tracked by quotas. Rather than silently logging these failures, the orchestrator enqueues them in a persistent cleanup_queue table in PostgreSQL for automatic retry.
Orphan bytes tracking — each enqueued item records the object's size_bytes. When an item is enqueued, the corresponding backend's orphan_bytes counter in backend_quotas is incremented. All capacity checks (write routing, replication target selection) subtract orphan_bytes from available space, so the write path never overcommits storage on a backend with pending cleanups. When a cleanup succeeds, orphan_bytes is decremented. This prevents a sustained backend outage from silently allowing quota overcommitment: even if a backend is down for days and cleanup retries are exhausting, the space consumed by orphaned objects remains reserved.
A background worker runs every minute, fetching pending items and attempting to delete them from their respective backends. Failed attempts are rescheduled with exponential backoff (1m × 2^attempts, capped at 24h). After 10 failed attempts, the row is graduated to the cleanup_dlq (dead-letter) table by core.MoveCleanupToDLQ in a single transaction. orphan_bytes is intentionally NOT decremented during the move because the backend object is still on disk — the bytes really are still occupying the backend's quota, and decrementing here would lie about reclaimed capacity. Operators monitor s3o_cleanup_dlq_depth and s3o_cleanup_dlq_enqueued_total{backend} to spot unrecoverable orphans, then resolve each entry deliberately (delete the object out-of-band, then write off the row + adjust orphan_bytes by its size).
Enqueue points cover all failure sites across the codebase:
- PutObject / CopyObject / CompleteMultipartUpload — orphaned object when
RecordObjectfails and the immediate cleanup delete also fails - PutObject / CopyObject (overwrite) — displaced copies on other backends when a key is overwritten; old copies that can't be immediately deleted are enqueued
- DeleteObject — metadata removed but backend delete fails (storage leak)
- UploadPart — part uploaded but
RecordPartfails and cleanup delete fails - CompleteMultipartUpload / AbortMultipartUpload — temporary
__multipart/part objects not deleted - Rebalancer — orphaned copy on destination when
MoveObjectLocationfails, or stale source copy after a successful move - Replicator — orphaned replica when
RecordReplicafails or source is deleted during replication
Enqueue is best-effort: if the database is down (circuit breaker open), the failure is logged and the orphan is not enqueued. This avoids cascading failures — if the DB recovers, the next operation that fails will be enqueued normally.
Operators inspect exhausted items in the dead-letter table:
SELECT id, original_id, backend_name, object_key, reason, attempts, size_bytes, first_enqueued_at, moved_at, last_error FROM cleanup_dlq ORDER BY moved_at; -- After confirming the object is gone (manual S3 delete, reconciler sweep, etc.): BEGIN; UPDATE backend_quotas SET orphan_bytes = GREATEST(0, orphan_bytes - (SELECT size_bytes FROM cleanup_dlq WHERE id = 42)) WHERE backend_name = (SELECT backend_name FROM cleanup_dlq WHERE id = 42); DELETE FROM cleanup_dlq WHERE id = 42; COMMIT; -- To push a DLQ entry back through automatic retry (e.g. after fixing the backend): INSERT INTO cleanup_queue (backend_name, object_key, reason, size_bytes, next_retry, attempts, last_error) SELECT backend_name, object_key, reason, size_bytes, NOW(), 0, last_error FROM cleanup_dlq WHERE id = 42; DELETE FROM cleanup_dlq WHERE id = 42;
PUT-before-COMMIT Pending Intents
The write path inserts a pending_objects row before sending the upload to
the backend, then deletes that row in the same transaction that records the
new object_locations row. The pattern guarantees that a DB outage between
the backend PUT and the metadata commit cannot silently destroy the prior
copy of an overwritten key.
If the metadata commit succeeds, the intent is gone within the same transaction and nothing else needs to happen. If the commit fails, the intent survives and a background pending reaper picks it up on the next tick:
- The reaper claims the intent by deleting it transactionally (so two concurrent reapers cannot resolve the same intent).
- It HEADs the destination backend.
- Backend has the object — the reaper promotes the intent into
object_locationsin the same transaction, taking the prior copy's place. Displaced copies on other backends are returned for cleanup. - Backend does not have the object — the reaper drops the intent; the original write effectively never happened.
- Concurrent successful write — if
object_locationsalready holds a newer row for the same key (created after the intent), the intent is provably stale and dropped without writing metadata.
Configurable via write_path.pending_pattern (default: enabled, 1-minute
reaper tick, 5-minute min_age so in-flight PUTs are not interrupted).
Setting enabled: false reverts to the legacy delete-on-record-failure
path, which trades data-loss safety for one fewer round-trip per PUT.
Lifecycle (Object Expiration)
Config-driven lifecycle rules automatically delete objects matching a key prefix after a configurable number of days. Useful for expiring temporary uploads, staging artifacts, or any objects with a known retention period.
lifecycle: rules: - prefix: "tmp/" expiration_days: 7 - prefix: "uploads/staging/" expiration_days: 1
A background worker runs hourly and evaluates each rule against created_at timestamps in the object_locations table (uses an existing index — no schema changes needed). Deletions go through the standard DeleteObject path, so all copies are removed, quotas are decremented, and failed backend deletes are enqueued to the cleanup queue.
Rules are hot-reloadable via SIGHUP. An empty rules list (or omitting the section entirely) disables lifecycle — no advisory lock is acquired and no DB queries are executed.
Orphan Reconciliation
Optional background service that periodically scans each backend's S3 bucket and reconciles it against the metadata database. For each backend, it walks both sides as ascending key streams — S3 paginated by ListObjects and the DB paginated by ListObjectsByBackendKeyAsc — and merges them in lockstep. Keys present only on the backend are imported; keys present only in the DB are removed. Memory is bounded by the page size on each side (1000 entries) regardless of object count, so backends holding millions of objects reconcile without OOM. Rows owned by sibling virtual buckets stored on the same backend are skipped so a per-bucket pass does not affect other buckets.
reconcile: enabled: true # disabled by default interval: "24h" # how often to run (default: 24h)
Disabled by default. Requires a restart to enable/disable (non-reloadable). Runs under advisory lock 1009 to prevent concurrent scans across instances.
On-demand reconciliation is available via the admin API — useful after backend data loss or token expiry events:
# Reconcile all backends s3-orchestrator admin reconcile # Reconcile a single backend curl -X POST -H "X-Admin-Token: $TOKEN" \ http://localhost:9000/admin/api/reconcile?backend=g3
Encryption
Optional server-side envelope encryption with AES-256-GCM. When enabled, every object is encrypted before it leaves the orchestrator — backends only ever see ciphertext. Each object gets a random 256-bit Data Encryption Key (DEK) that is wrapped by a master key before storage. The master key can come from an inline config value, a file on disk, or HashiCorp Vault Transit.
Objects are encrypted in fixed-size chunks (default 64 KB), so range requests (Range header) work without downloading the entire object — the orchestrator calculates which ciphertext chunks to fetch and decrypts only those. Clients see standard S3 behavior; encryption is fully transparent.
Key features:
- Chunked AES-256-GCM — each chunk has an independent nonce derived from a base nonce XORed with the chunk index, enabling random-access decryption
- Envelope encryption — per-object DEKs mean rotating the master key only requires re-wrapping DEKs, not re-encrypting data
- Key rotation — add the new master key, move the old one to
previous_keys, and call therotate-encryption-keyadmin API to re-wrap DEKs still using the old key - Encrypt existing data — the
encrypt-existingadmin API encrypts all unencrypted objects in-place without downtime - Decrypt existing data — the
decrypt-existingadmin API reverses encryption, restoring plaintext objects on backends (useful for disabling encryption or migrating away) - Vault Transit support — delegate key management to HashiCorp Vault for HSM-backed key storage. The Vault token is automatically renewed in the background; for Nomad workload identity deployments, use
token_fileto point at the Nomad-managed token file instead of a statictokenstring - Unknown key ID detection — when a wrapped DEK references a key ID that isn't the current primary or any configured previous key, a warning is logged before falling back to the primary key (signals potential metadata corruption or missing rotation key)
Compatibility with backend-side encryption: If your backend already has its own server-side encryption (e.g., AWS SSE-S3 or SSE-KMS), both layers work independently. The orchestrator encrypts before uploading and the backend encrypts the ciphertext again at rest. On read, the backend decrypts its layer and returns the orchestrator's ciphertext, which the orchestrator then decrypts. This is harmless but redundant — you can safely disable the backend's encryption to avoid unnecessary KMS costs.
See the Admin Guide for setup, key rotation, and encrypting existing data.
Object Data Cache
Optional in-memory LRU cache that stores full GET responses to reduce backend API calls and egress. When a cached object is requested, the response is served directly from memory without contacting the backend. Useful for read-heavy workloads where the same objects are fetched repeatedly.
Key behaviors:
- Full GET responses only — range requests bypass the cache on miss but are served from cache on hit
- Admission control — objects larger than
max_object_sizeare never cached, preventing a single large object from evicting many smaller ones - Automatic invalidation — cache entries are evicted on PutObject, DeleteObject, CopyObject, DeleteObjects, and CompleteMultipartUpload
- TTL-based expiry — entries expire after the configured TTL regardless of access, bounding staleness in multi-instance deployments where writes may happen on another instance
- Per-instance — each orchestrator instance maintains its own cache; caches are not shared across instances
- Post-decryption — when encryption is enabled, the cache stores decrypted plaintext (same security properties as any in-process data)
cache: enabled: true max_size: "256MB" # total cache capacity max_object_size: "10MB" # largest object eligible for caching ttl: "5m" # time-to-live per entry
Disabled by default. Requires a restart to enable/disable (non-reloadable).
Rate Limiting
Optional per-IP token bucket rate limiting. When enabled, requests exceeding the configured rate return 429 SlowDown with a Retry-After: 1 header. Stale IP entries are evicted by a background goroutine every cleanup_interval (default 1m); entries not seen within cleanup_max_age (default 5m) are removed. Under high source-IP cardinality (e.g., DDoS), the map can accumulate up to cleanup_max_age worth of unique IPs before eviction runs — tune these values if memory pressure is a concern.
When running behind a reverse proxy (e.g., Traefik, nginx), configure trusted_proxies with the proxy's CIDR ranges so the orchestrator extracts the real client IP from the X-Forwarded-For header using rightmost-untrusted extraction. Without trusted_proxies, X-Forwarded-For is ignored and the direct connection IP is always used.
Usage Limits
Per-backend monthly limits for API requests, egress bytes, and ingress bytes. Set any limit to 0 (or omit it) for unlimited. Limits reset naturally each month — the usage tracking table is keyed by YYYY-MM period.
Enforcement behavior:
- Writes (PutObject, CopyObject, CreateMultipartUpload, UploadPart) — backends over their limits are excluded from selection; writes overflow to the next eligible backend. If all backends are over-limit, the orchestrator returns
507 InsufficientStorage. - Reads (GetObject, HeadObject) — over-limit backends are skipped; the orchestrator tries replicas. Returns
429 SlowDownonly when all copies of the object are on over-limit backends. - Deletes (DeleteObject, DeleteObjects, AbortMultipartUpload) — always allowed regardless of limits.
Effective usage is computed as DB baseline + unflushed counters + proposed operation, so enforcement stays accurate between flush/refresh cycles without double-counting. The flush interval is configurable (default 30s) and can adaptively shorten when backends approach their limits. For multi-instance deployments, optional Redis shared counters eliminate the cross-instance blind spot between flushes.
Configuration
YAML config file specified via -config flag (default: config.yaml). Supports ${ENV_VAR} expansion.
server: listen_addr: "0.0.0.0:9000" max_object_size: 5368709120 # 5 GB (default) # max_concurrent_requests: 0 # total concurrent operations — HTTP + background services (0 = unlimited, default: 1000) # max_concurrent_reads: 0 # separate read concurrency limit (0 = use global) # max_concurrent_writes: 0 # separate write concurrency limit (0 = use global; background services share this budget) # load_shed_threshold: 0 # active shedding at this capacity ratio (0 = disabled) # admission_wait: "0s" # brief wait before rejection (0 = instant) # backend_timeout: "30s" # per-operation timeout for backend S3 calls (default: 30s; uses tighter of this or parent context deadline) # read_header_timeout: "10s" # max time to read request headers (default: 10s) # read_timeout: "5m" # max time to read entire request including body (default: 5m) # write_timeout: "5m" # max time to write response (default: 5m) # idle_timeout: "120s" # max time to wait for next request on keep-alive (default: 120s) # shutdown_delay: "0s" # delay before toggling readiness off and draining HTTP (default: 0; LB continues routing during delay) # tls: # cert_file: "/path/to/cert.pem" # hot-reloaded on SIGHUP; warns if cert expires within 24h # key_file: "/path/to/key.pem" # min_version: "1.2" # "1.2" (default) or "1.3" # client_ca_file: "" # CA bundle for mTLS client verification # Virtual buckets with per-bucket credentials buckets: - name: "app1-files" # max_multipart_uploads: 100 # optional; limit active multipart uploads per bucket (0 = unlimited) credentials: - access_key_id: "APP1_ACCESS_KEY" secret_access_key: "APP1_SECRET_KEY" - name: "shared-files" credentials: # Multiple services can share a bucket with separate credentials - access_key_id: "WRITER_ACCESS_KEY" secret_access_key: "WRITER_SECRET_KEY" - access_key_id: "READER_ACCESS_KEY" secret_access_key: "READER_SECRET_KEY" # Legacy token auth (backward compatibility) # - name: "legacy-bucket" # credentials: # - token: "my-secret-token" # SQLite (default) — zero-dependency, single-instance database: driver: sqlite path: "s3-orchestrator.db" # PostgreSQL — required for multi-instance deployments # database: # driver: postgres # host: "localhost" # port: 5432 # database: "s3proxy" # user: "s3proxy" # password: "secret" # ssl_mode: "require" # max_conns: 50 # default: 50; size to 2-3x max_concurrent_requests # min_conns: 10 # max_conn_lifetime: "5m" routing_strategy: "pack" # "pack" (fill in order) or "spread" (least utilized) (default: pack) backends: - name: "oci" endpoint: "https://namespace.compat.objectstorage.region.oraclecloud.com" region: "us-phoenix-1" bucket: "my-bucket" access_key_id: "backend-access-key" secret_access_key: "backend-secret-key" force_path_style: true unsigned_payload: true # stream uploads without buffering (auto-enabled for HTTPS, set explicitly for HTTP) disable_checksum: false # disable SDK default checksums for GCS and other providers that reject them strip_sdk_headers: false # strip AWS SDK v2 headers before signing for GCS compatibility quota_bytes: 21474836480 # 20 GB (0 or omit for unlimited) max_object_size: 52428800 # 50 MB per-object size limit (0 = unlimited) api_request_limit: 0 # monthly API request limit (0 = unlimited) egress_byte_limit: 0 # monthly egress byte limit (0 = unlimited) ingress_byte_limit: 0 # monthly ingress byte limit (0 = unlimited) telemetry: metrics: enabled: true path: "/metrics" # listen: "127.0.0.1:9091" # optional; serve metrics on a separate address (recommended for production) tracing: enabled: true endpoint: "localhost:4317" insecure: true sample_rate: 1.0 # fraction of requests that generate OTel traces (use 0.01–0.1 in production) circuit_breaker: failure_threshold: 3 # consecutive DB failures before opening (default: 3) open_timeout: "15s" # delay before probing recovery (default: 15s) cache_ttl: "60s" # key→backend cache TTL during degraded reads (default: 60s) parallel_broadcast: false # fan-out reads to all backends in parallel during degraded mode (default: false) # backend_circuit_breaker: # per-backend circuit breakers (disabled by default) # enabled: false # failure_threshold: 5 # consecutive failures before opening (default: 5) # open_timeout: "5m" # delay before probing recovery (default: 5m) rebalance: enabled: false strategy: "pack" # "pack" or "spread" (default: pack) interval: "6h" # run interval (default: 6h) batch_size: 100 # max objects per run (default: 100) threshold: 0.1 # min utilization spread to trigger (default: 0.1) concurrency: 5 # parallel moves per run (default: 5) replication: factor: 1 # copies per object; 1 = no replication (default: 1) worker_interval: "5m" # replication worker cycle (default: 5m) batch_size: 50 # objects per cycle (default: 50) concurrency: 5 # parallel replications per cycle (default: 5) unhealthy_threshold: "10m" # grace period before replacing copies on circuit-broken backends (default: 10m) cleanup_queue: concurrency: 10 # parallel cleanup deletions per tick (default: 10) rate_limit: enabled: false requests_per_sec: 100 # token refill rate (default: 100) burst: 200 # max burst size (default: 200) cleanup_interval: "1m" # stale entry eviction interval (default: 1m) cleanup_max_age: "5m" # evict entries not seen within this window (default: 5m) # trusted_proxies: # CIDRs whose X-Forwarded-For is trusted # - "10.0.0.0/8" # Uses rightmost-untrusted extraction # - "172.16.0.0/12" encryption: enabled: false # chunk_size: 65536 # plaintext bytes per chunk (default: 64KB, range: 4KB–1MB, power of 2) # master_key: "${ENCRYPTION_KEY}" # base64-encoded 256-bit key (exactly one key source required) # master_key_file: "/path/to/key" # alternative: raw 32-byte key file # vault: # alternative: Vault Transit # address: "http://vault:8200" # token: "${VAULT_TOKEN}" # static token (auto-renewed via RenewSelf) # # token_file: "/secrets/vault-token" # OR file-based (for Nomad workload identity; re-read periodically) # key_name: "s3-orchestrator" # mount_path: "transit" # default: transit # # ca_cert: "/path/to/ca.pem" # Vault CA certificate for TLS verification # # renew_interval: "5m" # token renewal check interval (default: 5m) # previous_keys: # old master keys for rotation (unwrap only) # - "base64-encoded-old-key" integrity: enabled: false # SHA-256 content hashing for data integrity verification # verify_on_read: false # hash-check GET responses as they stream # verify_on_replicate: true # verify hash when creating replicas (default: true when enabled) # scrubber_interval: "6h" # background verification interval (0 = disabled) # scrubber_batch_size: 100 # objects per scrub cycle # cache: # optional: in-memory LRU object data cache # enabled: false # disabled by default # max_size: "256MB" # total cache capacity (default: 256MB) # max_object_size: "10MB" # largest cacheable object (default: 10MB) # ttl: "5m" # per-entry time-to-live (default: 5m) ui: enabled: false # enable the built-in web dashboard path: "/ui" # URL prefix (default: /ui) admin_key: "${UI_ADMIN_KEY}" # access key for dashboard login admin_secret: "${UI_ADMIN_SECRET}" # secret key (plaintext or bcrypt hash) session_secret: "${UI_SESSION_SECRET}" # required — HMAC key for session cookies (independent of admin_secret) # admin_token: "" # separate token for admin API (defaults to admin_key) # force_secure_cookies: false # always set Secure flag on cookies (for behind TLS proxy) usage_flush: interval: "30s" # base flush interval (default: 30s) adaptive_enabled: false # shorten interval when near usage limits (default: false) adaptive_threshold: 0.8 # usage ratio to trigger fast flush (default: 0.8) fast_interval: "5s" # interval when near limits (default: 5s) # reconcile: # optional: periodic orphan reconciliation # enabled: false # scan backends for untracked objects (default: false) # interval: "24h" # how often to run (default: 24h) # redis: # optional: shared usage counters for multi-instance deployments # address: "redis:6379" # host:port (required when section is present) # password: "" # AUTH password (omit for no auth) # db: 0 # tls: false # key_prefix: "s3orch" # namespace for multi-tenant Redis (default: s3orch) # failure_threshold: 3 # consecutive failures before local fallback (default: 3) # open_timeout: "15s" # delay before probing recovery (default: 15s) lifecycle: rules: # empty or omitted = lifecycle disabled - prefix: "tmp/" # key prefix to match expiration_days: 7 # delete objects older than this - prefix: "uploads/staging/" expiration_days: 1
Provider quick reference — endpoint format and required flags for common S3-compatible providers:
| Provider | Endpoint | force_path_style |
Notes |
|---|---|---|---|
| AWS S3 | https://s3.<region>.amazonaws.com |
false (default) |
|
| MinIO | http://<host>:9000 |
true |
|
| OCI Object Storage | https://<ns>.compat.objectstorage.<region>.oraclecloud.com |
true |
|
| Backblaze B2 | https://s3.<region>.backblazeb2.com |
false |
|
| Cloudflare R2 | https://<account-id>.r2.cloudflarestorage.com |
false |
region: auto |
| Wasabi | https://s3.<region>.wasabisys.com |
false |
|
| Google Cloud Storage | https://storage.googleapis.com |
false |
Set disable_checksum: true and strip_sdk_headers: true |
See the Maximizing Free Tiers guide for detailed setup on each provider including where to find credentials.
Configuration Hot-Reload
The orchestrator supports hot-reloading a subset of configuration by sending SIGHUP to the running process. This lets you update credentials, quotas, rate limits, and other operational settings without restarting the service or dropping client connections.
kill -HUP $(pidof s3-orchestrator)
Reloadable vs non-reloadable settings
| Setting | Reloadable | Notes |
|---|---|---|
buckets (credentials, limits) |
Yes | Credentials and max_multipart_uploads take effect immediately |
rate_limit |
Yes | New visitors get updated rates; existing per-IP limiters expire naturally |
backends[].quota_bytes |
Yes | Synced to database on reload |
backends[].api_request_limit |
Yes | |
backends[].egress_byte_limit |
Yes | |
backends[].ingress_byte_limit |
Yes | |
rebalance |
Yes | Strategy, interval, threshold, concurrency, enabled/disabled |
replication |
Yes | Factor, worker interval, batch size |
usage_flush |
Yes | Interval, adaptive enabled/threshold/fast interval |
lifecycle |
Yes | Rules (prefix, expiration_days) |
integrity |
Yes | Enabled, verify_on_read, scrubber interval/batch size |
server.listen_addr |
No | Requires restart |
server.max_concurrent_requests |
No | Requires restart |
server.max_concurrent_reads |
No | Requires restart |
server.max_concurrent_writes |
No | Requires restart |
server.load_shed_threshold |
No | Requires restart |
server.admission_wait |
No | Requires restart |
server timeouts |
No | read_header_timeout, read_timeout, write_timeout, idle_timeout, shutdown_delay |
server.tls |
No | Requires restart |
database |
No | Requires restart |
telemetry |
No | Requires restart |
circuit_breaker |
No | Requires restart |
backend_circuit_breaker |
No | Requires restart |
ui |
No | Requires restart |
encryption |
No | Requires restart |
cache |
No | Requires restart |
redis |
No | Requires restart |
routing_strategy |
No | Requires restart |
reconcile |
No | Requires restart |
backends (structural: endpoint, credentials, count) |
No | Requires restart |
On a successful reload, the orchestrator logs each reloaded section:
{"level":"INFO","msg":"SIGHUP received, reloading configuration","path":"config.yaml"}
{"level":"INFO","msg":"Reloaded bucket credentials","buckets":2}
{"level":"INFO","msg":"Reloaded rate limits","requests_per_sec":100,"burst":200}
{"level":"INFO","msg":"Reloaded backend quota limits"}
{"level":"INFO","msg":"Reloaded backend usage limits"}
{"level":"INFO","msg":"Reloaded rebalance/replication/usage-flush config"}
{"level":"INFO","msg":"Configuration reload complete"}
If the new config file is invalid, the orchestrator keeps the current configuration and logs the error:
{"level":"ERROR","msg":"Config reload failed, keeping current config","error":"invalid config: ..."}
Non-reloadable field changes are logged as warnings but do not prevent the reload of other settings:
{"level":"WARN","msg":"Config field changed but requires restart to take effect","field":"server.listen_addr"}
Database
The orchestrator supports two metadata-store engines:
- SQLite (default) — embedded, zero-dependency, single-instance. Schema is
applied at startup from a single consolidated
schema.sql. - PostgreSQL — required for multi-instance deployments. Connects via
pgx/v5 pools and auto-applies versioned migrations on startup using
goose; migration files are embedded
in the binary and tracked via a
goose_db_versiontable so only unapplied migrations run.
Engine-agnostic orchestration lives in internal/store/core/ (transactional
business logic against a TxAdapter interface). Each engine package
(internal/store/postgres/, internal/store/sqlite/) is a thin adapter
that implements the same TxAdapter, so the same code drives both engines.
The schema currently provisions:
| Table | Purpose |
|---|---|
backend_quotas |
Per-backend byte limits, usage counters, and orphan bytes tracking |
object_locations |
Maps object keys to backends with size tracking |
multipart_uploads |
In-progress multipart upload metadata |
multipart_parts |
Individual parts for active multipart uploads |
backend_usage |
Monthly per-backend API request and data transfer counters |
cleanup_queue |
Retry queue for failed backend object deletions |
cleanup_dlq |
Dead-letter for cleanup_queue rows that exhausted retries; surfaces unrecoverable orphans for operator action |
pending_objects |
In-flight PUT intents recorded before the backend write so a DB outage can't silently destroy the prior copy |
notification_outbox |
Durable webhook event delivery queue |
Quota updates are transactional: object location inserts/deletes and quota counter changes happen atomically.
All Postgres SQL queries live in internal/store/postgres/sqlc/queries/ as annotated .sql files. Type-safe Go code is generated by sqlc into internal/store/postgres/sqlc/. To regenerate after editing queries:
Telemetry
Prometheus Metrics
All metrics are prefixed with s3o_. Exposed at /metrics when enabled.
| Metric | Type | Labels | Description |
|---|---|---|---|
s3o_build_info |
Gauge | version, go_version | Build metadata |
s3o_requests_total |
Counter | method, status_code | HTTP request count |
s3o_request_duration_seconds |
Histogram | method | Request latency |
s3o_request_size_bytes |
Histogram | method | Upload sizes |
s3o_response_size_bytes |
Histogram | method | Download sizes |
s3o_inflight_requests |
Gauge | method | Currently processing |
s3o_backend_requests_total |
Counter | operation, backend, status | Backend S3 API calls |
s3o_backend_duration_seconds |
Histogram | operation, backend | Backend latency |
s3o_manager_requests_total |
Counter | operation, backend, status | Manager-level operations |
s3o_manager_duration_seconds |
Histogram | operation, backend | Manager latency |
s3o_quota_bytes_used |
Gauge | backend | Current bytes used |
s3o_quota_bytes_limit |
Gauge | backend | Quota limit |
s3o_quota_orphan_bytes |
Gauge | backend | Bytes reserved by pending cleanup items |
s3o_quota_bytes_available |
Gauge | backend | Remaining space (limit − used − orphan) |
s3o_objects_count |
Gauge | backend | Stored object count |
s3o_active_multipart_uploads |
Gauge | backend | In-progress uploads |
s3o_rebalance_objects_moved_total |
Counter | strategy, status | Objects moved by rebalancer |
s3o_rebalance_bytes_moved_total |
Counter | strategy | Bytes moved by rebalancer |
s3o_rebalance_runs_total |
Counter | strategy, status | Rebalancer executions |
s3o_rebalance_duration_seconds |
Histogram | strategy | Rebalancer execution time |
s3o_rebalance_skipped_total |
Counter | reason | Rebalancer runs skipped |
s3o_rebalance_pending |
Gauge | — | Objects planned for rebalance |
s3o_replication_pending |
Gauge | — | Objects below replication factor |
s3o_replication_copies_created_total |
Counter | — | Replica copies created |
s3o_replication_errors_total |
Counter | — | Replication errors |
s3o_replication_duration_seconds |
Histogram | — | Replication cycle time |
s3o_replication_runs_total |
Counter | status | Replication worker executions |
s3o_replication_health_copies_total |
Counter | — | Copies created to replace copies on circuit-broken backends |
s3o_over_replication_pending |
Gauge | — | Objects exceeding the replication factor |
s3o_over_replication_removed_total |
Counter | — | Excess copies removed |
s3o_over_replication_errors_total |
Counter | — | Over-replication cleanup errors |
s3o_over_replication_runs_total |
Counter | status | Over-replication worker executions |
s3o_over_replication_duration_seconds |
Histogram | — | Over-replication cleanup cycle time |
s3o_circuit_breaker_state |
Gauge | name | 0=closed, 1=open, 2=half-open (name: "database" or backend name) |
s3o_circuit_breaker_transitions_total |
Counter | name, from, to | State transitions per component |
s3o_degraded_reads_total |
Counter | operation | Broadcast reads in degraded mode |
s3o_degraded_cache_hits_total |
Counter | — | Cache hits during degraded reads |
s3o_degraded_write_rejections_total |
Counter | operation | Writes rejected in degraded mode |
s3o_usage_api_requests |
Gauge | backend | Current month API request count |
s3o_usage_egress_bytes |
Gauge | backend | Current month egress bytes |
s3o_usage_ingress_bytes |
Gauge | backend | Current month ingress bytes |
s3o_usage_limit_rejections_total |
Counter | operation, limit_type | Operations rejected by usage limits |
s3o_cleanup_queue_enqueued_total |
Counter | reason | Items added to the cleanup retry queue |
s3o_cleanup_queue_processed_total |
Counter | status | Items processed from the cleanup queue (success/retry/exhausted) |
s3o_cleanup_queue_depth |
Gauge | — | Current pending items in the cleanup queue |
s3o_cleanup_dlq_depth |
Gauge | — | Unrecoverable orphans waiting in the cleanup dead-letter table |
s3o_cleanup_dlq_enqueued_total |
Counter | backend | Cleanup rows graduated to the dead-letter after exhausting retries |
s3o_rate_limit_rejections_total |
Counter | — | Requests rejected by per-IP rate limiting |
s3o_admission_rejections_total |
Counter | — | Requests rejected by server-level admission control |
s3o_lifecycle_deleted_total |
Counter | — | Objects deleted by lifecycle expiration |
s3o_lifecycle_failed_total |
Counter | — | Objects that failed lifecycle deletion |
s3o_lifecycle_runs_total |
Counter | status | Lifecycle worker executions |
s3o_audit_events_total |
Counter | event | Audit log entries emitted |
s3o_drain_active |
Gauge | — | 1 while a backend drain is in progress |
s3o_drain_objects_moved_total |
Counter | — | Objects migrated during drain |
s3o_drain_bytes_moved_total |
Counter | — | Bytes migrated during drain |
s3o_encryption_operations_total |
Counter | op | Encrypt/decrypt operations (encrypt, decrypt, decrypt_range) |
s3o_encryption_errors_total |
Counter | op, error_type | Encryption/decryption failures |
s3o_encryption_unknown_key_id_total |
Counter | — | Decryption attempts with unknown keyID (primary key fallback) |
s3o_encrypt_existing_objects_total |
Counter | status | Objects processed by encrypt-existing (success/error) |
s3o_decrypt_existing_objects_total |
Counter | status | Objects processed by decrypt-existing (success/error) |
s3o_key_rotation_objects_total |
Counter | status | DEKs re-wrapped by key rotation (success/error) |
s3o_redis_operations_total |
Counter | operation, status | Redis command outcomes (incrby, get, getset, pipeline_add, pipeline_load) |
s3o_redis_fallback_active |
Gauge | — | 1 when Redis is unavailable and using local counters |
s3o_cache_hits_total |
Counter | — | Object data cache hits |
s3o_cache_misses_total |
Counter | — | Object data cache misses |
s3o_cache_evictions_total |
Counter | — | Object data cache evictions (LRU or TTL) |
s3o_cache_size_bytes |
Gauge | — | Current memory used by cached objects |
s3o_cache_entries |
Gauge | — | Current number of cached objects |
s3o_integrity_checks_total |
Counter | operation | Integrity hash verifications performed (read, scrub) |
s3o_integrity_errors_total |
Counter | operation | Hash mismatches detected (corrupted copies enqueued for cleanup) |
Quota metrics are refreshed from PostgreSQL every 30 seconds (no backend API calls).
A ready-to-import Grafana dashboard covering all metrics is included at grafana/s3-orchestrator.json.
OpenTelemetry Tracing
Spans are emitted for every HTTP request, manager operation, and backend S3 call. The service registers as s3-orchestrator (resource.service.name). Traces propagate via W3C traceparent headers. Configured to export via gRPC OTLP to Tempo or any OTLP-compatible collector.
Trace-to-log correlation — every JSON log line emitted within an active span automatically includes trace_id and span_id fields. Log aggregators (Loki, etc.) can use these fields to link logs to their corresponding traces in Tempo or any OpenTelemetry-compatible tracing backend. Only log calls that receive a context.Context with an active span include trace context; application-level logs without a span context are unaffected.
Audit Logging
Structured audit log entries are emitted as JSON via slog for every S3 API request and significant internal operation. Each entry includes an "audit": true marker for easy filtering in log pipelines.
Request ID tracing — every S3 API request gets a unique request ID, returned in the X-Amz-Request-Id response header. Clients can supply their own via the X-Request-Id request header. The same ID flows through context to all downstream operations, appearing in both the HTTP-level audit entry and the storage-level audit entry for full request correlation. The ID is also set as a s3o.request_id attribute on OpenTelemetry spans, linking audit logs to traces.
Two-level audit entries — each S3 request produces two audit log lines: one at the HTTP layer (s3.PutObject, s3.GetObject, etc.) with method, path, bucket, status, duration, and remote address, and one at the storage layer (storage.PutObject, storage.GetObject, etc.) with the backend name, object key, and size. Both share the same request_id.
Internal operation auditing — background operations generate their own correlation IDs:
| Operation | Events |
|---|---|
| Rebalancer | rebalance.start, rebalance.move, rebalance.complete |
| Replicator | replication.start, replication.copy, replication.complete |
| Over-replication cleaner | over_replication.start, over_replication.remove, over_replication.complete |
| Multipart cleanup | storage.MultipartCleanup |
| Overwrite (displaced) | storage.overwrite_displaced |
| Cleanup queue | cleanup_queue.processed, cleanup_queue.exhausted_to_dlq |
Example audit log entry:
{"level":"INFO","msg":"audit","audit":true,"event":"s3.PutObject","request_id":"a1b2c3d4e5f6...","operation":"PutObject","method":"PUT","path":"/my-files/photo.jpg","bucket":"my-files","status":200,"duration":"45ms"}Webhook Notifications
Optional outbound webhooks for object mutations and operational events.
Events are written to a durable notification_outbox table inside the
same transaction as the originating change, then a background drainer
POSTs them as CloudEvents-formatted JSON to each configured endpoint.
The outbox pattern means events are never lost on crash and never sent
twice for the same change.
Two event categories are supported:
- Data events — S3-style object mutations (
s3:ObjectCreated:Put,s3:ObjectRemoved:Delete, etc.) carrying the bucket and key. - Operational events — backend health (
backend.circuit.opened,backend.capacity.warning), integrity (integrity.corruption_detected), cleanup (cleanup.exhausted), replication and lifecycle completions.
Each endpoint declares which event-type patterns it cares about and an optional HMAC-SHA256 signing key:
notifications: endpoints: - url: "https://hooks.example.com/storage" events: - "s3:ObjectCreated:*" - "s3:ObjectRemoved:*" prefix: "uploads/" # only deliver data events under this prefix secret: "${HOOK_SECRET}" timeout: 5s max_retries: 5
Failed deliveries retry with exponential backoff. After max_retries,
the row is dropped and an audit warning is emitted. See
web/content/guides/event-notifications.md
for the full event catalog and signature-verification recipe.
Web UI
A built-in web dashboard provides operational visibility and management without external tooling. When enabled, it renders a server-side HTML page at the configured path (default /ui/). All routes require authentication via HMAC-signed session cookies — users log in with an admin key/secret pair configured in the YAML config.
The dashboard shows:
- Storage Summary — total bytes used/capacity across all backends with a progress bar
- Backends — quota used/limit per backend with progress bars, object counts, active multipart uploads
- Monthly Usage — API requests, egress, and ingress per backend with limits
- Objects — interactive collapsible tree browser; buckets and directories expand on click to reveal contents, with rollup file counts and sizes
- Configuration — virtual buckets, write routing strategy, replication factor, rebalance strategy, rate limit status
- Logs — recent structured log output from an in-memory ring buffer (last 5,000 entries), filterable by severity level with client-side text search and optional auto-refresh
The dashboard also provides management actions:
- Upload — upload files to any virtual bucket via the browser
- Download — download individual objects from the file tree
- Delete — delete individual objects from the file tree
- Rebalance — trigger an on-demand rebalance across backends
- Clean Excess — remove over-replicated copies that exceed the replication factor
- Sync — import pre-existing objects from a backend's S3 bucket into the proxy database, scoped to a selected virtual bucket
The object tree uses JavaScript for lazy-loaded AJAX expansion — directories load their children on click via the /ui/api/tree endpoint. All dashboard responses include security headers (X-Frame-Options, X-Content-Type-Options, Referrer-Policy, Content-Security-Policy). Enable it in the config:
ui: enabled: true path: "/ui" # default admin_key: "${UI_ADMIN_KEY}" admin_secret: "${UI_ADMIN_SECRET}"
JSON APIs are available at {path}/api/dashboard, {path}/api/tree, and {path}/api/logs for programmatic access. The logs endpoint accepts optional query parameters: level, since, component, and limit. Management endpoints ({path}/api/delete, {path}/api/delete-prefix, {path}/api/upload, {path}/api/rebalance, {path}/api/clean-excess, {path}/api/sync) accept POST requests and return JSON responses. The download endpoint ({path}/api/download?key=...) accepts GET requests.
Endpoints
Public
| Path | Method | Purpose |
|---|---|---|
/{bucket}/{key} |
* | S3 API (PutObject, GetObject, etc.) |
/health |
GET | Liveness — always 200; body is ok or degraded (when DB circuit is open) |
/health/ready |
GET | Readiness — 200 once startup is complete; flips to 503 during shutdown drain |
/metrics |
GET | Prometheus metrics (when telemetry.metrics.enabled) |
Admin API (X-Admin-Token required)
| Path | Method | Purpose |
|---|---|---|
/admin/api/status |
GET | Backend health, quota, circuit-breaker state |
/admin/api/object-locations?key=... |
GET | Per-backend ledger for one object key |
/admin/api/cleanup-queue |
GET | Cleanup queue depth and pending sample |
/admin/api/usage-flush |
POST | Force out-of-band flush of usage counters |
/admin/api/replicate |
POST | Trigger one replication cycle |
/admin/api/log-level |
GET / PUT | View or set the running instance's log level |
/admin/api/over-replication |
GET / POST | Show pending excess copies / trigger cleanup |
/admin/api/rotate-encryption-key |
POST | Re-wrap DEKs that still reference an old master key |
/admin/api/encrypt-existing |
POST | Encrypt all unencrypted objects in-place |
/admin/api/decrypt-existing |
POST | Decrypt all encrypted objects in-place |
/admin/api/scrub |
POST | Trigger one integrity-scrub pass |
/admin/api/backfill-checksums |
POST | Compute hashes for objects predating integrity |
/admin/api/reconcile |
POST | Trigger an out-of-band reconcile pass |
/admin/api/backends/{name}/drain |
POST / GET / DELETE | Start / inspect / cancel a backend drain |
/admin/api/backends/{name} |
DELETE | Remove backend metadata (use ?purge=true + ?confirm=true to also delete S3 objects) |
Web UI (X-Session-Cookie after login; enabled only when ui.enabled)
| Path | Method | Purpose |
|---|---|---|
/ui/ |
GET | Dashboard HTML |
/ui/login |
GET / POST | Login page |
/ui/api/dashboard |
GET | Dashboard data as JSON |
/ui/api/tree |
GET | Lazy-loaded directory listing |
/ui/api/upload |
POST | Multipart-form file upload |
/ui/api/download |
GET | Object download (?key=...) |
/ui/api/delete |
POST | Delete one object |
/ui/api/delete-prefix |
POST | Delete every object under a prefix |
/ui/api/rebalance (+ /status) |
POST / GET | Trigger / poll rebalance |
/ui/api/clean-excess (+ /status) |
POST / GET | Trigger / poll over-replication cleanup |
/ui/api/replicate (+ /status) |
POST / GET | Trigger / poll replicate |
/ui/api/scrub (+ /status) |
POST / GET | Trigger / poll integrity scrub |
/ui/api/backfill-checksums (+ /status) |
POST / GET | Trigger / poll checksum backfill |
/ui/api/encrypt-existing (+ /status) |
POST / GET | Trigger / poll encrypt-existing |
/ui/api/sync |
POST | Import objects from a backend's S3 bucket |
/ui/api/logs |
GET | Buffered log entries (level, since, component, limit) |
Background Tasks
All locked background tasks apply a random startup jitter of up to half the tick interval before the first tick, preventing thundering herd on the advisory lock when multiple instances start simultaneously.
| Task | Interval | Advisory Lock | Description |
|---|---|---|---|
| Usage flush + metrics | configurable (default 30s) | When Redis configured | Flushes usage counters to PostgreSQL, then refreshes quota stats, usage baselines, object counts, and multipart counts. Updates Prometheus gauges. Adaptive mode shortens interval near limits. Advisory lock is acquired whenever Redis is configured (regardless of health) to prevent double-counting during recovery. |
| Stale multipart cleanup | 1h | Yes | Aborts multipart uploads older than 24h and deletes their temporary part objects. |
| Cleanup queue | 1m | Yes | Retries failed backend object deletions with exponential backoff (1m to 24h, max 10 attempts). On the tenth consecutive failure the row graduates to cleanup_dlq for operator action; orphan_bytes stays incremented because the bytes are still on disk. |
| Rebalancer | configurable (default 6h) | Yes | Moves objects between backends per strategy. Only runs when enabled. |
| Replicator | configurable (default 5m) | Yes | Creates copies of under-replicated objects. Only runs when factor > 1. Runs once at startup. |
| Over-replication cleaner | configurable (default 5m) | Yes | Removes excess copies of objects that exceed the replication factor. Only runs when factor > 1. |
| Lifecycle | 1h | Yes | Deletes objects matching lifecycle rules whose created_at exceeds expiration_days. Only runs when rules are configured. |
| Reconciler | configurable (default 24h) | Yes | Scans each backend for untracked objects and imports them into the metadata database via SyncBackend. Only runs when reconcile.enabled: true. |
| Pending reaper | configurable (default 1m) | Yes | Resolves PUT-before-COMMIT intents that survived a failed metadata commit. HEADs the destination backend and either promotes the row into object_locations (object present) or drops the intent (object absent). Skips intents younger than min_age so in-flight PUTs are not interrupted. |
| Scrubber | configurable (default 6h) | Yes | Random-samples objects, fetches and re-hashes them, and enqueues a cleanup if the stored content_hash does not match. Only runs when integrity.enabled: true and scrubber_interval > 0. |
| Notification drainer | 5s | No | Drains notification_outbox rows by POSTing CloudEvents JSON to configured webhook endpoints. Optional HMAC signing per endpoint. |
| CB watchdog | 1m | No | Checks all circuit breakers for stale half-open probes. If a probe has been in flight longer than 2 minutes, resets the circuit to open so a new probe can be dispatched. Prevents circuits from getting stuck half-open when traffic stops. |
Background services (rebalancer, replicator, over-replication cleaner, cleanup queue) share the admission semaphore with HTTP requests, so max_concurrent_requests is the total budget for both HTTP and background backend operations.
Multi-Instance Deployment
Multiple orchestrator instances can safely share the same PostgreSQL database. Background tasks (rebalancer, replicator, cleanup queue, multipart cleanup) use PostgreSQL advisory locks to prevent concurrent execution across instances — if one instance holds the lock for a task, other instances skip that tick silently.
Request-serving paths (PutObject, GetObject, etc.) are stateless and work correctly with any number of instances behind a load balancer. The per-instance location cache is TTL-bounded and self-correcting. Rate limiting remains per-instance.
Usage Counters
Without Redis, each instance tracks usage counters independently in memory and flushes to PostgreSQL at the configured interval (default 30s). Between flushes, instances cannot see each other's accumulated usage, which can allow quota overshoot under high throughput.
With Redis configured, all instances share the same usage counters via Redis INCRBY/GET operations. The baseline+delta formula stays the same (DB baseline + counter + proposed), but the counter lives in Redis instead of local memory, eliminating the cross-instance blind spot. When Redis is active, only one instance flushes counters to PostgreSQL (coordinated via advisory lock) since GETSET is a destructive read.
A circuit breaker monitors Redis health. If Redis becomes unavailable, the backend falls back to local in-memory counters automatically — same behavior as running without Redis. A background health probe PINGs Redis periodically and, on recovery, syncs local deltas back to Redis via an additive INCRBY pipeline before resuming shared operation. The entire local counter map is swapped atomically (single pointer swap) so no concurrent Add calls can lose deltas between the snapshot and the pipeline. Stale Redis keys from before the outage expire via TTL. Local counters are zeroed only after the pipeline commits, so a crash mid-recovery cannot lose deltas. The recovery is safe for concurrent execution by multiple instances since INCRBY is additive.
redis: address: "redis.example.com:6379" password: "${REDIS_PASSWORD}" key_prefix: "s3orch" # namespace for multi-tenant Redis failure_threshold: 3 # consecutive failures before fallback open_timeout: "15s" # delay before probing recovery
Redis is optional. Without it, adaptive flushing still shortens the flush interval when any backend approaches a usage limit, improving enforcement accuracy.
CLI Subcommands
Running s3-orchestrator with no subcommand starts the daemon (-config and -mode flags). The subcommands below are all dispatched by the same binary; pass -h after any of them for usage.
version
Prints the binary version, Go version, and platform:
s3-orchestrator version
# s3-orchestrator vX.Y.Z go1.26.X linux/amd64init
Generates a configuration file interactively. Prompts for database driver (SQLite or PostgreSQL), one or more storage backends, and one or more virtual buckets, then writes a validated config.yaml:
s3-orchestrator init # writes ./config.yaml s3-orchestrator init -config /etc/s3o.yaml # custom path
The generated config is round-tripped through the loader before being written, so the file the user lands with is guaranteed to validate.
help
Prints the subcommand summary:
s3-orchestrator help
s3-orchestrator -hvalidate
Validates a configuration file without starting the server. Exits 0 on success with a brief summary, or exits 1 with error details:
s3-orchestrator validate -config config.yaml # config config.yaml: valid # backends: 2 # buckets: 1 # routing: spread
sync
Imports pre-existing objects from a backend S3 bucket into the orchestrator's metadata database. Useful when bringing an existing bucket under orchestrator management. The --bucket flag specifies which virtual bucket the imported objects belong to — keys are stored with a {bucket}/ prefix for namespace isolation.
# Import all objects from a backend into the "unified" virtual bucket s3-orchestrator sync --config config.yaml --backend oci --bucket unified # Preview what would be imported s3-orchestrator sync --config config.yaml --backend oci --bucket unified --dry-run # Import only objects under a prefix s3-orchestrator sync --config config.yaml --backend oci --bucket unified --prefix photos/
| Flag | Default | Description |
|---|---|---|
--config |
config.yaml |
Path to configuration file |
--backend |
(required) | Backend name to sync |
--bucket |
(required) | Virtual bucket name to prefix imported keys with |
--prefix |
"" |
Only sync objects with this key prefix |
--dry-run |
false |
Preview what would be imported without writing |
Objects already tracked in the database for that backend are skipped. The command logs per-page progress and a final summary with imported count, skipped count, and total bytes imported.
admin
Operational CLI for a running instance. Reads config.yaml to discover the server address and admin token. See the Admin Guide for full details.
s3-orchestrator admin status # backend health and usage s3-orchestrator admin object-locations -key "..." # find all copies of an object s3-orchestrator admin cleanup-queue # cleanup queue depth s3-orchestrator admin usage-flush # force flush usage counters s3-orchestrator admin replicate # trigger replication cycle s3-orchestrator admin over-replication # show over-replicated object count s3-orchestrator admin over-replication --execute # clean excess copies s3-orchestrator admin over-replication --execute --batch-size 200 # with custom batch s3-orchestrator admin log-level # view current log level s3-orchestrator admin log-level -set debug # change log level at runtime s3-orchestrator admin drain <backend> # start draining a backend s3-orchestrator admin drain-status <backend> # check drain progress s3-orchestrator admin drain-cancel <backend> # cancel an active drain s3-orchestrator admin remove-backend <backend> # remove backend DB records (S3 objects preserved) s3-orchestrator admin remove-backend <backend> --purge # preview: shows what would be destroyed s3-orchestrator admin remove-backend <backend> --purge --confirm # delete S3 objects + DB records s3-orchestrator admin reconcile # reconcile DB against all backends s3-orchestrator admin reconcile -backend g3 # reconcile a single backend s3-orchestrator admin scrub # trigger an integrity scrub cycle s3-orchestrator admin backfill-checksums # compute hashes for unhashed objects
Development
# Install build and packaging dependencies make tools # Regenerate sqlc query code (after editing .sql files) make generate # Run locally (starts MinIO + PostgreSQL via Docker, then runs the server) make run # Lint make lint # Static analysis make vet # Scan Go dependencies for known vulnerabilities make govulncheck # Run unit tests make test # Run integration tests (requires Docker) make integration-test # Build local Docker image make build # Create a new database migration make migration # Build multi-arch and push to registry make push VERSION=vX.Y.Z # Build a .deb package for the host architecture make deb VERSION=X.Y.Z # Build .deb packages for both amd64 and arm64 make deb-all VERSION=X.Y.Z # Build and run lintian validation make deb-lint VERSION=X.Y.Z # Publish .deb packages to an Aptly repository make publish-deb # Dry-run GoReleaser locally (builds everything without publishing) make release-local
Deployment
The orchestrator can run as a Docker container, a native systemd service, or on container orchestration platforms. Production-ready manifests for Nomad and Kubernetes are in deploy/, with local demo scripts that stand up a complete environment in one command.
Container Orchestration (Nomad / Kubernetes)
Example manifests in deploy/ demonstrate a three-backend setup with replication factor 2, spread routing, and full observability. Local demo scripts build from source and deploy against docker-compose backing services:
# Kubernetes via k3d (requires: docker, k3d, kubectl) make kubernetes-demo # Nomad in dev mode (requires: docker, nomad) make nomad-demo
See deploy/README.md for production deployment instructions and customization options (TLS, mTLS, Vault integration, Ingress).
Prerequisites
- PostgreSQL database (schema auto-applied on startup)
- At least one S3-compatible storage backend
- Configuration file with credentials
- Redis (optional — for shared usage counters in multi-instance deployments)
- TLS termination — either via the built-in
server.tlsconfig or a reverse proxy (Traefik, nginx, Ingress). Plain HTTP exposes SigV4 signatures and object data on the wire, and theUNSIGNED-PAYLOADstreaming mode means body integrity depends entirely on transport security. See the Security Hardening guide for TLS and mTLS setup.
Docker
Build and push a multi-arch image with a version tag:
The VERSION is baked into the binary via -ldflags and displayed in the web UI header and /health endpoint. Defaults to the value in .version if omitted.
Debian Package
Build a .deb package for bare-metal or VM deployments:
Install and configure:
sudo dpkg -i s3-orchestrator_X.Y.Z_amd64.deb
sudo vim /etc/s3-orchestrator/config.yaml
sudo vim /etc/default/s3-orchestrator # set DB_PASSWORD, backend keys, etc.
sudo systemctl start s3-orchestratorThe package installs:
| Path | Purpose |
|---|---|
/usr/bin/s3-orchestrator |
Binary |
/etc/s3-orchestrator/config.yaml |
Configuration (conffile, preserved on upgrade) |
/etc/default/s3-orchestrator |
Environment variables for ${VAR} expansion |
/usr/lib/systemd/system/s3-orchestrator.service |
Systemd unit |
/var/lib/s3-orchestrator/ |
Data directory |
The systemd unit runs as a dedicated s3-orchestrator user with filesystem hardening (ProtectSystem=strict, ProtectHome=yes, NoNewPrivileges=yes). Config reload via systemctl reload s3-orchestrator sends SIGHUP.
Releasing
Tag a version and push to trigger an automated GitHub Release via GoReleaser:
This regenerates CHANGELOG.md via git-cliff, tags the current .version value, and pushes the tag. The tag triggers GoReleaser to build Linux binaries (amd64 + arm64), Debian packages, and SHA256 checksums — all attached to the GitHub Release.
To regenerate the changelog without releasing:
Commit categorization is configured in cliff.toml. Commit messages starting with Add, Fix, Harden, Refactor, Improve, docs:, test:, or chore(deps): are automatically grouped into the appropriate section.
Docker images are still built manually since the private registry isn't reachable from GitHub Actions:
To dry-run the release locally (builds everything without publishing):
Project Structure
cmd/s3-orchestrator/ Binary entry: subcommand dispatch + thin shims
main.go Entry point, subcommand dispatch
admin.go / init_cmd.go / sync.go Shim into internal/cli/{adminctl,initcmd,synccmd}
validate.go / version.go Validate-config and version subcommands
internal/
cli/ CLI-side dispatch and bootstrap
serve/ Daemon lifecycle: build the DI injector, start HTTP, SIGHUP reload, shutdown
adminctl/ Admin operational CLI (HTTP client wrapping the admin API)
initcmd/ Interactive config-file generator
synccmd/ Pre-existing bucket import CLI
di/ Single wiring point for samber/do/v2
di.go Every Provide<X> for stores, workers, handlers, backends
services.go Lifecycle-managed background services
transport/ HTTP interface layer (no business logic)
s3api/ S3-compatible XML/REST API
server.go HTTP router, bucket resolution, key prefixing, metrics
buckets.go HeadBucket, GetBucketLocation, ListBuckets, versioning stubs
objects.go PUT, GET, HEAD, DELETE, COPY, DeleteObjects handlers
list.go ListObjectsV1 / V2 handlers
multipart.go Multipart upload handlers
helpers.go Path parsing, header guards, S3 XML error responses
ratelimit.go Per-IP token bucket
admission.go Concurrency limit + load shedding
admin/handler.go Admin API: status, drain, replicate, scrub, encrypt-existing, etc.
auth/auth.go BucketRegistry, SigV4 verification, legacy token auth
ui/ Web dashboard
handler.go HTTP handler + session auth + JSON APIs
admin_actions.go Async-trigger endpoints (rebalance, scrub, encrypt-existing, ...)
async.go Shared async-job result store consumed by /status endpoints
templates.go Embedded template loader + formatting helpers
templates/ Dashboard and login HTML
static/ CSS, JS (directory tree, log viewer)
httputil/
clientip.go X-Forwarded-For + X-Forwarded-Proto with trusted-proxy CIDRs
loginthrottle.go Per-IP brute-force protection
certreloader.go TLS certificate hot-reload + expiry warning
observe/ Observability layer
audit/audit.go Request-id context plumbing + structured audit logger
telemetry/ Per-domain Prometheus metric files (metrics_*.go) + OTel helpers
event/event.go Notification event types + Emit hook
config/ YAML loader split by domain (server, database, backends, ...)
breaker/
breaker.go Three-state CircuitBreaker state machine
registry.go Watchdog-swept registry of all breakers (DB + per-backend)
backend/
s3.go ObjectBackend interface + S3Backend (AWS SDK v2)
circuitbreaker.go Per-backend CircuitBreaker wrapper
backendtest/ Failure-injectable wrapper used by tests
store/ Metadata store
circuitbreaker.go Database CircuitBreaker wrapper
cb_*.go Per-role CB decorators (Object, Pending, Cleanup, Quota, ...)
core/ Engine-agnostic orchestration
types.go Domain types (ObjectLocation, PendingObject, CleanupQueueRow, ...)
errors.go Sentinel errors and structured S3Error
interfaces.go Narrow per-role store interfaces
adapter.go TxAdapter (the per-engine seam) + Reader
runner.go Runner interface + generic WithTxVal[T] helper
objects.go RecordObject, DeleteObject, MoveObjectLocation, ImportObject
pending.go PromotePending orchestration
cleanup.go SweepStaleCleanupQueueRows + MoveCleanupToDLQ
replication.go RecordReplica
helpers.go Engine-agnostic helpers (intentSuperseded, applyQuotaDeltas, ...)
postgres/ Postgres engine adapter
store.go *Store satisfies core.Runner via WithTx
adapter.go pgTxAdapter satisfies core.TxAdapter against sqlc.Queries
objects.go / quota.go / multipart.go / replication.go / cleanup_queue.go / pending.go
admin.go / advisory_lock.go / integrity.go / notifications.go / usage.go
migrations/ Versioned goose migrations (embedded)
sqlc/ Generated type-safe query code (do not edit)
sqlite/ SQLite engine adapter
store.go / adapter.go / objects.go / quota.go / multipart.go / pending.go
cleanup.go / replication.go / admin.go / directory.go / migrations.go
schema.sql Consolidated schema (translates Postgres migrations)
counter/ Per-backend usage counters
counter.go CounterBackend interface + field constants
local.go In-memory atomic backend (default)
redis.go Redis shared backend with CB fallback
tracker.go Usage limit enforcement, baseline management, flush
proxy/ Manager layer
manager.go BackendManager: composition root, routing, config accessors
manager_writepath.go PUT-before-COMMIT pending-row write path
objects.go ObjectManager type, constructor, shared helpers
objects_read.go Read failover, broadcast reads, GetObject, HeadObject, ListObjects
objects_write.go PutObject, CopyObject, DeleteObject, DeleteObjects
multipart.go Multipart lifecycle
reconcile.go Bounded-memory sorted-merge reconciliation engine
core.go Shared infrastructure (timeout, admission, routing helpers)
lifecycle.go Lifecycle expiration rule processing
integrity.go Integrity-aware GET wrapper (read-time hash verification)
encryption_helpers.go On-write encrypt + on-read decrypt adapters
cache.go LocationCache (key -> backend) with TTL + background eviction
stores.go Stores struct bundling the narrow per-role interfaces
drain/ Backend-drain coordinator
dashboard/ DashboardData aggregation + lazy directory listing
metrics/ Manager-level Collector (per-op record + periodic gauge refresh)
proxytest/ Test-only helper: AttachWorkers, StoresFromMock
worker/ Background services
ops_runtime.go Runtime-side ops interfaces (admission, timeout, usage, backend access)
ops_store.go Per-worker store-role interfaces
rebalancer.go Object rebalancing across backends
replicator.go Cross-backend object replication
overreplication.go Over-replication detection + excess-copy cleanup
cleanup.go Cleanup queue retry worker (graduates to DLQ on exhaustion)
pending.go PendingReaper (PUT-before-COMMIT intent resolver)
scrubber.go Integrity scrubber + content-hash backfill
reconciler.go Orphan reconciler driver (consumes proxy/reconcile engine)
notify/notifier.go Webhook notification drainer (notification_outbox pattern)
encryption/ Envelope encryption (AES-256-GCM, key providers, Vault Transit)
cache/ Object-data LRU cache with TTL
lifecycle/ Generic supervisor for long-lived services
internalkey/ Internal key prefix helpers shared by transport + store
testutil/ Shared test fakes (MockStore, builders)
integration/ End-to-end tests against MinIO + Postgres testcontainers
(gated by `//go:build integration`)
grafana/
s3-orchestrator.json Grafana dashboard (all Prometheus metrics)
sqlc.yaml sqlc configuration
Dockerfile Multi-stage build
Makefile Build, test, lint, generate, push, deb targets
nfpm.yaml Debian package definition (nfpm)
packaging/
s3-orchestrator.service Systemd unit file
config.yaml Sample config installed to /etc/s3-orchestrator/
s3-orchestrator.default Default env file installed to /etc/default/
preinstall.sh Creates system user and directories
postinstall.sh Enables systemd service
postremove.sh Purge cleanup (removes user and data)
changelog Debian changelog
copyright Debian copyright file
lintian-overrides Lintian override rules
cliff.toml git-cliff changelog generation config
CHANGELOG.md Auto-generated changelog (make changelog)
config.example.yaml Configuration reference
deploy/
nomad/
s3-orchestrator.nomad.hcl Production Nomad job (Vault integration)
local/
s3-orchestrator.nomad.hcl Local dev job (docker-compose backing services)
demo.sh One-command Nomad dev demo
helm/
s3-orchestrator/
Chart.yaml Helm chart metadata
values.yaml Default production values
templates/ Deployment, Service, ConfigMap, Secret, Ingress, etc.
kubernetes/
local/
values.yaml Local dev Helm values (docker-compose backing services)
demo.sh One-command k3d demo
Additional Documentation
| Guide | Description |
|---|---|
| Quickstart | Get running in under a minute |
| User Guide | S3 client configuration and usage |
| Admin Guide | Configuration, operations, monitoring, deployment |
| API Reference | UI and Admin API JSON endpoint documentation |
| Security Hardening | TLS, mTLS, config security, network segmentation |
| Performance Tuning | Connection pools, timeouts, routing, rebalancer tuning |
| Disaster Recovery | Failure scenarios and recovery procedures |
| Version Migration | Upgrade guide, config changes by version |
| Style Guide | Coding conventions for contributors |
| Contributing | How to build, test, and submit changes |
