Mission Impossible: Kubernetes Operator Failure Story (Kafka)
Yesterday we had a Kubernetes version upgrade automatically initiated by DigitalOcean which broke our Kafka instance and I solved it with a somewhat atypical, insane solution, by reading the operator code, understanding it, patching it and deploying it manually. (keep in mind, I have never looked at this codebase.)
https://github.com/strimzi/strimzi-kafka-operator/issues/6136#issuecomment-1828476027
What seemed impossible, becomes possible. No matter the layer of complexity, your Kubernetes operator, the Kubernetes codebase, you can DOWNLOAD IT, MODIFY IT and PUSH IT. This is NOT the first time I've cloned code that may seem complicated to fix and understand and did it.
The lesson is basically this:
Give yourself no limits, you will have no limits. Dive deeper!
Of course other lessons (which are more obvious.)
1.Make sure you upgrade your software
2. If your backups are disk snapshots, you may not be able to actually recover from them if the running software cannot resume
No comments yet.