Why Kubernetes is so complex
cloudplane.orgI think the answer is hidden in the last paragraph. It was created by Google engineers to solve Google scale problems where they run maybe hunderds of thousands of services with a dedicated SRE team and massive tooling. For a small company, give me a way to run one container in an autoscable way and we can go from there.
Which is why kubernetes got adopted - everyone pretends they are Google, having Google problems, when they, in fact, don't.
This is a heretical idea in some circles. Utter it and expect to be ostracized. In my case it’s cost a few job offers when I’ve suggested that a few python processes running under supervisor (e.g.) would be more than sufficient. It has its place but that point lies at a stage of development and sophistication far beyond where most cargo-cult users are currently.
This is a heretical idea in *most circles. Same experience with job offers!
I think it’s also probably because it gives people a “predictable” way to do things.
That being said I sometimes ask myself why we can’t constantly think KISS and YAGNI. Like, do we really need this level of abstraction and complexity? I’ve been “working” with k8s and I would probably fail any interview on it because I feel like I’m always googling my way through issues. I don’t even care anymore because I know for my own purposes outside of work, I keep my code and systems stupid simple.
And maybe this sounds cringey to some but I’m happy to write a few scripts on my own to handle deployments without needing to break my software into a thousand pieces. Single responsibility code using a few languages that are best suited for the task at hand (in my case it’s mostly node, elixir, go) that’s easy to break apart and ship separately is so nice. Why can’t we do the same at work?
Oh well, I’ll collect my check 2x a month thanks.
Large companies are done with huge k8s clusters. Now every app team and their SRE team rolls out their own k8s cluster. I am not sure whether these companies are solving Google scale problems.
Maybe they weren't solving Google-scale problems before but with Kubernetes they sure are now!
> For a small company, give me a way to run one container in an autoscable way and we can go from there.
What's the best way you've found to do that?
ECS/Fargate for long lived things. Lambdas for short lived things.
+1 for Fargate and Lambda. If you do not like Fargate, you can also run containers in EC2 for a bit more control.
Nah, scaling will be hard. I came to realisation that I don’t want any clusters, let them manage it :)
We used to do this at my current job, but we've recently started transitioning to Kubernetes. There are certainly things I like a lot about it (specifically, the k9s CLI), but I definitely miss the simplicity of ECS/Fargate.
That plus fleet of ec2 for real heavy loads.
Google App Engine https://cloud.google.com/appengine
Azure is really simple if you use dotnet. Just create an app and then click scale-out (or scale-up).
They of course support docker and k8 also. And Azure functions which are like lambda in AWS.
I really like Azure Container Apps for this. It's a great way to dip into the world of containers.
Could someone explain to me in what why Kubernetes is complex and what alternatives are simpler? I’ve worked on non-k8s systems before and in my experience they all hang together with custom bash/python code which although line-for-line is ‘simpler’ it makes it harder to onboard new people and is less robust (excluding very simple deployments)
K8s is very modular in my experience so if you don’t need something you can easily ignore it and not pay a complexity cost. Nomad does not seem much simpler to me (especially because you basically have to pair it with Consul and Vault)
I am genuinely curious.
Mostly agree, I prefer k8s to learning the custom duct tape for every project.
Observation: a side effect of being extensible is that people deploy extensions.
There is some kind of law of complexity budgets, where if you make the simple things easy, people will tend to ratchet up complexity by adding more stuff until the system "just" fits in their heads again.
Bare k8s with a simple ingress path and workload is predictable and nice to admin.
Cluster with lots of extra bits (custom autoscalers, cert-manager, complex ci systems, serverless stuff, custom operators, service meshes) can have lots of "non-local" interactions and seems to lead to environments that are scary to upgrade.
You kinda need to still learn the custom duct tape though. Kubernetes is something you need to learn on top of all the details. It doesn't replace it. (It may however seem that way until you run into any actual problems with your software)
Yeah it's more like a network of duct tape webbed together within in a clear plastic box with a "kubectl" button on it along with a paper feeder that takes a stack of generated config files as input for each button press.
And this controls how the duct tape sticks everything together, so it's not like any two boxes are the same.
I guess it depends on what you're comparing Kubernetes to. If what's being compared is a large codebase with lots of loosely connected cloud formation, then Kubernetes can make your life easier. If what's being compared is a small application that can run on a collection of EC2's behind an ASG and easily deployed via Terraform then it's probably complex for that usecase.
It's all relative.
Nomad does not seem much simpler to me (especially because you basically have to pair it with Consul and Vault)
Hi, Nomad PM here - We've gotten this feedback a lot and have been taking steps to respond to it. We added simple service discovery in Nomad 1.3 and health checks and load balancing shortly after. So you shouldn't need Consul until you want a full service mesh. And then in Nomad 1.4, which just launched, we added Nomad Variables. These can be used for basic secrets & config management. It isn't a full on replacement for Vault, but it should give people the basic functionality to get off the ground.
So going forward we won't have a de factor dependency on these other tools, and hopefully we can live up to the promise of simplicity.
AWS ECS
What about Nomad?, I hear it isn't as complex as K8s, but still offers the same capabilities.
Also in what way is Swarm is abandoned?, I mean if it works fine, and is still supported in Docker-CE than its still OK to use it, at least in small businesses and hobbyist use cases where Swarm's simplicity are attractive.
I recently looked at Nomad, but ironically for a "less complex" piece of software, I couldn't be sure it would scale down to my use case, I ended up with K3S as it would run on the small hardware nodes i needed it to run on.
I would have preferred Nomad but the resource requirements are pretty high for the "control server" component. https://www.nomadproject.io/docs/install/production/requirem...
Obviously This is not going to fit on a group of Raspberry Pi's or other SBC computer nodes you can solar power out in a field.Nomad servers may need to be run on large machine instances. We suggest having between 4-8+ cores, 16-32 GB+ of memory, 40-80 GB+ of fast disk and significant network bandwidth. The core count and network recommendations are to ensure high throughput as Nomad heavily relies on network communication and as the Servers are managing all the nodes in the region and performing scheduling. The memory and disk requirements are due to the fact that Nomad stores all state in memory and will store two snapshots of this data onto disk, which causes high IO in busy clusters with lots of writes.I too was initially put off by those requirements. Now we run our Nomad server on a single t3.medium instance that sits at a 0.1 15 minute load average and has ~500MB of RAM used.
This manages about 100 client nodes. No need for a cluster since we don't need high availability on our control plane, and there's no actual state stored there that isn't created from our CI pipeline.
Glad you went for it despite our poor documentation cb22! That sounds like a great setup. I think the only defensible way to describe our "Requirements" page is that we wanted to make the safest suggestion for the widest range of users. Obviously it's wildly inaccurate for a wide range use cases, and we should fix that.
Nomad servers could start with 300mhz, 100mb of memory, and eMMC storage and run a RaspberryPI cluster just fine. Our most important resource guidance is all the way over in our Monitoring docs!
> Nomad servers' memory, CPU, disk, and network usage all scales linearly with cluster size and scheduling throughput.
https://developer.hashicorp.com/nomad/docs/operations/monito...
Any cluster can start with 300mhz and 100mb as long as they monitor usage and scale appropriately.
We're going to try to update our Requirements docs to add this nuance and guidance on how to calculate requirements based on projected load. We recently spent some time improving our heartbeat docs, and I think the approach we took there will serve us well for system requirements: https://developer.hashicorp.com/nomad/docs/configuration/ser...
Thank you for the follow up! This makes it much more likely that I’ll remember to give it a shot when the next applicable project comes up… Because I’ll probably have forgotten these comments and gone to check the docs again :-)
See this is the sort of information that they should have posted on the requirements page. Not some arbitrary sized high water mark needing gigs of disk and memory. I'll have to give using Nomad (and Consul) another go next time, and test it on physical hardware to see.
Thanks for the info! and for doing what Hashicorp seemed too busy to do themselves.
After reading TFA I don't feel enlightened. I don't think the author answered or even began to seriously address the question posed in the title; the discussion is approximately a beginner-level introduction to K8s Controllers.
A bit disappointed with cloudplane here.
Edit: @dollar - good one! Quite plausibly the case.
Kubernetes is so complicated, the author got started describing it, then gave up and wandered off topic.
Instead of your 2c you gave whole dollar here
The article did feel half finished. Maybe it could be a series of articles, I suppose.
But jumping right to "Let's look at how Kubernetes works behind the scenes, and why the complexity may be a tradeoff worth making." this would be more useful to me. Specifically, like "why all the pieces" and comparing them to other solutions, which may be challenging.
The piece was meant as a brief behind-the-scenes look for people hesitant to adopt K8s, sorry if the title is misleading. I'm not the best writer, but working to improve.
I really miss the simplicity of Rancher and their Cattle orchestrator circa 2016 or so.
Kubernetes is way, way to much for many teams to be able to operate properly. It can be done the right way, and it absolutely has it’s use cases, but I see so many people using it that really shouldn’t be.
People that understand how infrastructure works knows that k8s is not complex.
Everyone else gotta complain about it just because.
Not sure how you could deny that it's complex. It adds additional moving parts that weren't there before, and takes control of things that once operated autonomously.
Whether or not that extra complexity is necessary or beneficial is what's debatable.
It's the same reason someone thinks putting an entire OS behind the "doIt" function simplified everything.
Because complexity sells.
It really is interesting.
What was shown was the ability of systemd to have restart policies of units and the ability to load secret over some sort of Unix socket primitive. Plus, it does not even try to do topological sorts, it restarts everytime and accepts that its preconditions are false.
And basically as it can do that for any more or less untyped pile of resource, it is "flexible". Sure, void* + a tag is flexible.
K8s is tiring. Reconciliation is not exclusive to K8s, it's not the best system we have, not even close.
It is a particularly popular system with very specific choices, which has a nice property of assuming that state drifts therefore reconciliation is a must.
The annoying part is that: to show everyone else that K8s is complex, it is necessary to build a reconciliation based piece of software that compose well with the rest of the world and prove that you don't need K8s to achieve the same features that most people use, except if you are $bigcorp. Alas, people have finite time and I do think it is quite clear how to build this using more fundamental pieces such as systemd and more.
Making this kind of article even more frustrating because I get the good intent of convincing people that K8s is not frightening and complicated. I really feel there is a lack of theory and research definitions in this area of computer science. Rigor is missing.
What hurts me the most is when I was told "it's not that simple" because I wanted a container enabled machine to put a container on that sits behind the public web server which forward requests to it (reverse proxy).
What I was told is that it doesn't scale and k8's is simpler because how does it talk to the database otherwise? Oddly enough, I'm not sure this person has ever _just_ worked with containers without k8's and so it all falls into a black box.
Which is odd, but all of this is to take roughly 100 servers and get them into the cloud.
At some point I have to wonder if it's even possible for many of these same people to work in a way that's simple.
What's the advantage of using K8s over Google App Engine? GAE pretty much has out of box scaling, security, containers etc. You just deploy your app and let Google handle it.
Unless you hit the front page of hackernews are asked by google to upgrade your package due to increased traffic , you click yes, and then it immediately without warning takes it offline until a human can review the new agreement and details, in a few days time, ruining your launch. Mention it in the comments section then suddenly have 3 sketchy people saying it can’t have happened and asking for more information, giving you the impression some Of them are desperately trying to suppress the comment of all things instead of not saying anything or apologising.
The worst wasn’t the system literally was designed in a broken manner. It was the rudeness of anyone seemingly involved in the project.
But hey maybe that’s changed now.
- Why is this thing so complex?
- Let me explain why to you with something even more complex because, hell, at least I understood this and you still haven't. :)