Do you really need Kubernetes?
blog.ekern.meThere is a corollary to this: Do you really need cloud infrastructure?
Cattle not pets right?
Well, no. Have you seen amazons AWS margins? Its 30%.
After amazon buys hardware, pays people to run it, it still makes 30%. Not having hardware is someone else's profit.
That isnt cattle, its contract poultry farming.
Learn capacity planing. Learn to write cachable, scalable apps. Track your hardware spend per customer. Learn about assets vs liablity (hang out with the accountants, they are nerds too). Do some engineering dont just be a feature factory. And if you are going to build features, make fuckin sure that you build tracking into them and hold the product teams feet to the fire when the numbers dont add up (see: friends with accountants, and tracking money).
> Not having hardware is someone else's profit.
But it's also someone else's economies of scale. The chances of me getting datacenter space, hardware, bandwidth, and expert 24/7 staffing at the same volume discounts they do is... slim. Particularly for the small amounts I'd need.
>> Particularly for the small amounts I'd need.
How small... ovh, hetzener are both a thing and cheap as chips if your VPS sized. You just have to plan ahead for what you're doing and how you want to scale it. It requires a bit more upfront thought... but if your on a budget (personal project scale) knowing that you have a FIXED COST is pretty good for sleeping easy at night.
There's a middle ground - there are plenty of providers that will give you a turn key "insert credit card here, get root SSH access" experience similar to AWS EC2 but on physical hardware and at much smaller margins. See OVH, Equinix Metal, Hetzner, Packet, etc.
In most cases where your total needs are less than a dedicated server, you're probably correct. Once you hit that level, issues like network traffic, EBS volumes, etc, really start to creep that cloud bill up fast.
They may be enjoying economy of scale, but they aren't passing that along as savings to their customers. Why would you when people seem happy to pay high prices?
There is more to this: who is you?
If you are the owner who foots the bill and has the capability to run your own infra then nothing beats it. If you don't have the capability then cloud throws you the lifeline at a price of course. Pay for it and be happy.
If you are the guy who runs/manages the infra for someone: then there is no point in saving dollars. You peddle Kubernetes and go to the kubecon and post all about it on linkedin and establish yourself as a kubernetes expert. When the owner of current gig goes under, you will have a bunch of job offers to pick from.
Besides, kubernetes solves a problem very elegantly that most companies do not have. Not everyone is google and running apps on web scale with an expectation of 99.99% uptime...
Cloud is an abstraction over hardware. lLike any good abstraction, it makes certain tradeoffs.
Sometimes it makes sense to move to a lower level of abstraction, for performance, cost, or compatibility reasons. In this case, diving under the cloud and running your own servers could save 30% of your server costs.
Cloud adoption has shown that many (most?) companies prefer the convenience over cost savings. Maybe optimizing hardware spend is not the best way to optimize a business.
AWS spends tons of engineering time on features you don't need. There is a lot more than 30% on the table.
I fully agree with your last paragraph, but not sure what it has to do with cloud infrastructure specifically. AWS also leverages huge economies of scale into profit, it's not like you're going to realize equivalent margins by running your own little server in a colo somewhere. You certainly won't realize equivalent availability, scalability, security, support ecosystem, etc with it either. Cloud infra can make a lot of sense even with "pets" - you just gotta make sure to understand requirements, limitations, and use the right tools for the problem. For me personally, I'll reach for cloud infra these days as a reasonable default (similar to reaching for Postgres as a reasonable database default), especially if it's managed and affordable.
> it's not like you're going to realize equivalent margins by running your own little server in a colo somewhere
There are a lot of projects where the bandwidth charges alone would make self-managed on-prem come out ahead, and I'm not even talking the instant 2-3x performance boost you get by moving to real hardware instead of overprovisioned cloud VM hosts.
>> it's not like you're going to realize equivalent margins by running your own little server in a colo somewhere
At this scale ovh, hetzner are the solutions. If your that small fixed fee pricing is a HUGE win.
> it's not like you're going to realize equivalent margins by running your own little server
I get even better than their margins. In the neighborhood of 100x cheaper. It's not rocket science.
> equivalent availability
AWS availability isn't all that super great. Most of their services are rated for only 99.95 before they offer pittance credits. That's not difficult to meet with a single computer...
> scalability
A service only needs good enough scalability. Auto scaling is also a bug, not just a feature. Remember that it is also tied to auto billing. I can't afford to have my wallet DDOSed.
> security
Security is always a problem that needs to be solved. You don't get an auto pass on security needs because you signed up for AWS.
> support
You have to pay for that.... You can get it from other vendors too if you have budget for it.
> ecosystem
This is flat out wrong. Compare AWS offerings to the breadth and depth of open source offerings and the latter comes out far ahead.
Learn to write cachable, scalable apps.
I'm expected to write a service like s3?
That's not what they meant. And, S3-compatible object stores already exist. Ceph, for one.
Cachable, scalable apps that you write as part of a web-centric business are ones that use in-application level knowledge to coordinate with the web framework to use caching services at each level of depth of the stack, be it 3rd-party Cloudflare or DIY CDNs (varnish/squid), memcache/NoSQL/database-level.
Depends on S3's role in your application. If you actually need unlimited, worldwide-replicated object storage with nearly limitless bandwidth, sure, use S3 and pay for it.
Do you just need a place to stash files so your multiple application servers can read them? An NFS export over a private network can very well be all you need, and you can have that file server back itself up to S3/etc at regular intervals.
Well, you could deploy Minio.
I see Kubernetes as an enabler. Grab some commodity hardware - your own or Linode - and have a cloud experience using open-source components.
I just wish Kubernetes wasn’t so darn complicated.
Or use GKE/AKS/EKS and save on the salaries needed to operate clusters and hardware
The biggest thing “the cloud” had gotten me is that I no longer have to call a sales rep and wait a week if I need a server. Yes this is a process problem, but for whatever reason it seems to stop being a problem (as much) with companies moving to the cloud. I have seen this 3+ times now. As an engineer, I’d much rather move off prem than fix bad management.
If you mean buying an actual server I order on menu regularly. If you mean a dedicated server in a DC then there are plenty of providers who can auto provision a bare metal instance for you. That was possible before AWS even opened up to public sales.
You’re not wrong, but generally the political friction of deploying a new service to k8s is lower than what you propose as well.
Well yes, but I run container orchestration on owned servers (Swarm, not Kubernetes) so it's really easy to add a new service at a whim and only quite rarely do we need to add capacity, to which we can also add in a cloud provider. But I'm using off-lease servers which are sort of bonkers as to how much capacity you get for the price. I think they cost less than a months spend on an equivalent cloud instance.
That’s a good point. Preemptable instance can be insanely cheap. I would guess it’s more efficient for the data center overall to shuttle services around to different nodes as needed.
Fully agree. Cloud can be an easy way to get started since you don't have to pay as much up front, and even you need extremely elastic scaling you'll save a fortune in the long term by investing in at least some on-prem hardware to handle the off-peak workloads. If you have predictable stable loading, you can save even more!
If you need servers for the majority of your application, don't use AWS or any other mahor cloud provider. Their benefits come from economy of scale, so if you cannot be part of that economy, do something else.
And you can be part of it really only by doing cloud native stuff like Lambda, DynamoDB et al.
That's like recommending people to buy a trailer to haul their groceries because transportation companies make a profit on hauling cargo.
Maybe some people do need a trailer, maybe some do not. As long as you don't blindly follow the cloud you can also profit from them.
Your first full-time sysadmin is an expensive hire. So is your first DBA. And even if your database backups are working now, there's a good chance they'll silently break in the next several years.
The simplest thing you could do is to build a single-container application, and deploy it a Heroku-like system with a fully managed database. If this actually works for your use case, then definitely avoid Kubernetes.
But eventually you'll reach a point where you need to run a dozen different things, spread out across a bunch of servers. You'll need cron jobs and Grafana and maybe some centralized way to manage secrets. You'll need a bunch of other things. At this point, a managed Kuberentes cluster is no worse than any other option. It's lighter weight than 50 pages of Terraform. You won't need to worry about how to get customized init scripts into an autoscaling group.
The price is that you'll need to read an O'Reily book, you'll need to write a moderate amount of YAML, and you'll need to pay attention to the signs reading Here There Be Dragons.
Kuberentes isn't the only way to tackle problems at this scale. But I've used Terraform and ECS and Chef and even a custom RPM package repo. And none of these approaches were signficantly simpler than Kubernetes once you deployed a full, working system for a medium-sized organization.
> At this point, a managed Kuberentes cluster is no worse than any other option
Except in terms of pricing...?
K3s adds minor overhead to my single node $8/mo vps.
Edit: no, not managed. Quite easy to get started with though.
Edit2: took ~440MB of ram & 3% CPU (most of the CPU is the local storage provisioner, weirdly?). Why? Because it was much easier than continuing to maintain and run the handcraft Ansible scripts I've done over the years, and let me easily manage DNS, certs, metrics, two dbs, and its easier to reuse these on my other infra than any other option.
I just ran the k3s install script on a vm, and then ran 'systemctl status k3s', and apparently it's using 1.4GB of memory (I haven't started any pods yet). I understand why you might opt for Kubernetes in a multi node cluster, but what does this provide in a single node cluster that you can't achieve with Docker Compose (or Podman Compose)? Not trying to be snarky, I'm legitimately curious.
EDIT: After playing around with this, it appears to scale its memory usage relative to the machine's total available memory. If you run it on a vm with 1GB or less of memory it'll use up roughly half the machine's memory.
Is this "managed Kubernetes" then?
I’m sceptical of this article. I’m an indy dev using K8s at vultr (VKS) and it’s absolutely simplified my life.
The article suggests just using EC2 instead of K8s, but if I do that, I now have to manage an entire operating system. I have to make sure the OS is up to date, and balance all the nuances this entails, especially balancing downtimes, and recovery from upgrades. Major OS upgrades are hard, and pretty much guarantee downtime unless you’re running multiple instances in which case how are you managing them?
Contrast to VKS where, with much less effort, OS upgrades are rolled out to nodes with no downtime to my app. Yes, getting to this point takes a little bit of effort, but not much. And yes, I have multiple redundant VPS, which is more expensive, but that’s a feature.
K8s is perhaps overly verbose, and like all technologies it has a learning curve, but im gonna go out on a limb here and say that I’ve found running a managed K8s service like VKS is way easier than managing even a single Debian server, and provides a pile of functionality that is difficult or impossible to achieve with a single VPS.
And the moment you have more than one VPS, it needs to be managed, so you’re back at needing some kind of orchestration.
The complexity of maintaining a unix system should not be underestimated just because you already know how to do it. K8s makes my life easier because it does not just abstract away the underlying node operating system, it obviates it. In doing so, it brings its own complexities, but there’s nothing I miss about managing operating systems. Nothing.
(Author here)
The main focus of the post is to highlight some of the long-term risks and consequences of standardizing around Kubernetes in an org. If you've done a proper evaluation, and still think Kubernetes makes sense for you, then it's probably a sound decision. But I think many skip the evaluation step or do it hastily. The post is more targeted towards organizations with at least a handful of employees. What works for an indy dev does not necessarily scale and work for SMBs or larger orgs - those are very different contexts.
> The article suggests just using EC2 instead of K8s
Not quite. I suggest strongly considering using managed services when it makes sense for your organization. The equivalent of k8s in terms of managed services would be Amazon Elastic Container Service (ECS) as the control plane, perhaps with AWS Fargate as the compute runtime.
(I wouldn't really call EC2 a managed service - it's more in the territory of Infrastructure as a Service)
I may have misread ECS as EC2 and I apologise for that.
But the argument you make should certainly be applied to other managed services. AWS generally has opaque pricing, and significant hidden complexity - are you really going to just subscribe to ecs and fargate? Or are you subscribing to a bunch of other complexities like CloudWatch, IAM, EBS, etc etc? If I want to control costs then do I also need some third party service? How many IOPS does my database need, anyway?
I’m not an AWS user, because every time I’ve looked at it I’ve come away shaking my head at how complex everything is, and how much vendor specific technology I need to learn just to do something simple.
And, having run organisations with more than a handful of employees, if there’s anything I’ve learned it’s that simplicity is a virtue.
In fact, the last company I was involved with went all-in on AWS which involved formal training for everyone, very high costs, and multiple dedicated administrators. My part of the business pre dated that decision, and we did well over 10x the throughput with a single dedicated ops expert, using our own gear, orchestrated with docker-swarm. Our costs were literally 10% of the cost of AWS for the other part of the business, including amortisation of the hardware, and that’s before all the extra training and operational costs of AWS.
Today, it’s far easier to run K8s than it was to run swarm back then. So quite honestly, if you’re an Indy developer like me, K8s is almost a no brainer, and if you’re a mid sized SaaS shop, AWS is just a really great example of spending tens of thousands of dollars a month to say you’re running in AWS.
There's some legit notions here, but overwhelmingly it uses insinuation & suggestion to sow Fear Uncertainty and Doubt.
> Despite its portability, Kubernetes also introduces a form of lock-in – not to a specific vendor, but to a paradigm that may have implications on your architecture and organizational structure. It can lead to tunnel vision where all solutions are made to fit into Kubernetes instead of using the right tool for the job.
This seems a bit absurd on a number of fronts. It doesn't shape architecture that much, in my view; it runs your stuff. Leading to tunnel vision, preventing the right tool for the job? That doesn't seem to be a particularly real issue; most big services have some kind of Kubernetes operator that seems to work just fine.
Kubernetes seems to do a pretty fine job of exposing platform, in a flexible and consistent fashion. If it was highly opinionated or specific, it probably wouldn't have gotten where it is.
I think the larger issue is how Kubernetes often is implemented in organizations - as part of internal developer platforms owned by central teams which on purpose or by accident can end up dictating how development teams should work. I think it's easy for such central teams to fall into the trap of trying to build smart, custom abstractions on top of Kubernetes to simplify things, but over time I believe these types of abstractions run a high risk of slowing down the rest of the org (good abstractions are really hard to come by!) and creating fuzzy responsibility boundaries between central and development teams. As an example, this can affect an organizational structure by (re-)introducing functional silos between development and operations. Can a development team really be fully responsible for what they build if they rely on high-level, custom abstractions that only someone else in the org really understands?
Furthermore, if everything in an org is containerized and runs on Kubernetes, it's really easy to have a strong bias towards containerized workloads, which in turn can affect the kind of systems you build and their architecture.
Can you name any systems which are possible by being non-containerized? What do you see as the advantage here?
It seems like a legacy view of the world that containers are at all worse. Today they seem to offer minimal overhead & access to all the same hardware capabilities as native apps.
>>> This seems a bit absurd on a number of fronts. It doesn't shape architecture that much, in my view; it runs your stuff.
I mean, let's be candid.
There are plenty of times where "containers" are bags of shit software that were pushing into production and throwing hardware at them to keep things going. There are containers out there with out of date libraries that aren't getting updated cause the "work" and no one gives a shit.
If you can get away with that, what is the incentive to do highly integrated engineering that produces diagonal scalability? Why be WhatsApp when you can just throw money at bad software?
Containers can be a crutch for poor software maintenance, oh sure! Lots and lots of companies skate by with shoddy container infrastructure (often to relatively little ill impact imo). But I don't see much opportunity that is opened up by getting away from containers. One can do "highly integrated engineering that produces diagonal scalability" just as easily on containers as not, in my view. Containers don't inhibit much.
WhatsApp remains the glorious one example of ultra-efficient software. Way to go taking ejabberd+erlang and going far!! But does that path exclude Kubernetes? I doubt it. That architecture would have ran fine as containers or+and on Kubernetes. It wouldn't have made a material difference to what they were doing. It just would have been a different way to manage the underlying platform. Who knows, maybe it would have provided some patterns/templates/apiserver-ing that the team would have found useful to develop atop, to build forward on, that they instead had to build themselves?
I'd say 99% of companies using Kubernetes can't really explain why they chose it over Nomad.
The entire reason is popularity/marketshare, and that's a really valid reason to choose tech like this.
Yep. Plenty of people on the market with experience, and your employees know it’s a skill with market value, so they won’t resent your choice and start eyeing the door.
Makes sense. And with LLMs training on what is publicly available, this becomes self-fulfilling, because you'll get better answers from AI about what is popular - not what is technically superior.
That's already true on forums where answers depend on humans. It's possible that LLMs will make it even more prevalent but they won't have started the phenomenon.
I don't think most companies can say why they're not at least isolating workloads with something like Kata Containers and why they're using only glorified cgroup jails, have an inventory of all the services they're running and why, can point to which machines have authoritative copies of data, how they back it up without replicating corruption, and how they'd do disaster recovery/BCP on it.
It's easy. When I look at resumes that come in, a sea of people listing experience with k8s and rarely a mention of nomad.
When your team has familiarity with something, it's a bit harder to suggest an alternative unless it's quite a lot better
Nomad isn't FOSS anymore, for one
Related: https://doineedkubernetes.com/
Does this site return something different if it detects a Google/Facebook/Amazon corporate IP visiting?
It wouldn't be hard to check by BGP AS.
Stack Overflow routinely didn't work at one MAANG because of egress from a narrow range of IPs appearing to be bot-like behavior, but it was really an enormous amount to traffic from ~10k's of users.
Why would it? They have their own fleet management systems; Borg/Tupperware/ECS. I guess for the EKS team you'd say yeah.
For small teams I also think Kubernetes often greatly complicates the per-service operational overhead by making it much more difficult for most engineers to manage their own deployments. You will inevitably reach a point that engineers need to collaborate with infra folks, but in my experience that point gets moved up a lot by using Kubernetes.
Hey now, I made a killing in AWS consulting to convince megacorps to get rid of their own hardware and avoid going the OpenStack route.
The problems of pre-IaaS and pre-K8s were manageability, flexibility, and capacity utilization. These problems still haven't really been solved to a standardized, interoperable, and uniform manner because stacks continue to mushroom in complexity. Oxide appears to be on the right track but there is much that can be done to reduce the amount of tinkering, redundant abstractions, and avoiding conventional lifecycle management and cross-cutting concerns that people don't want to think about whenever another new way comes along.
I found that just using CloudRun and similar technologies is simpler and easier to manage than kubernetes. You need auto scaling, fast startup, limit number of concurrent connections to each instance, and scale to zero functionality.
I agree that Cloud Run greatly simplifies deployments.
Unfortunately, it only auto-scales based on requests and, eventually, CPU. We are in the process of moving our Temporal workers from Cloud Run to GKE Autopilot, which is ~30% cheaper given we can use arm64 Scale-Out nodes.
No. We chose ECS instead :-)
That said, we are planning on doing a cloud exit in the future. I don't feel we need Kubernetes, but we do need to orchestrate containers. In our case, it's less scale, and more isolation.
It’s funny that for an exactly same workload on AWS on raw Ec2 it ends up cheaper than the same cluster size on EKS…
That is true of every managed service on AWS.
Related "Dilbert on Kubernetes"