“You don't need this overengineered goo for your project.”
twitter.com> “You don't need this overengineered goo for your project.”
k8s is probably a great excuse to think how to compose your infrastructure and software in a declarative way - I'm still fascinated by https://demo.kubevious.io/ - It just made "click" when playing with that demo - it's not goo it's a different operating system and a different mindset.
You can do 80% of that with docker-compose / swarm for small projects but:
If you read HN you are in a huge bubble - gruelsome patched tomcat7 apps on Java8 with 20 properties/ini/xml config files are still popular - hosting things in docker or doing ci/cd is still not mainstream. At least in Europe in the public sector stuff where I was involved.
Sure you can mock it - but the declarative approach is powerful - if you can pull it off to have it across all your infrastructure and code with ci/cd and tests you are fast.
This alone correctly implemented https://github.com/adobe/rules_gitops solves so many problems I can't count the useless meetings we had over any of these bullet points, bazel alone would have solved most major pain points in that project. Just by beeing explizit and declarative.
Don't believe the hype but it's a powerful weapon.
I love these Twitter takes. They say something to get attention but if you look at it for more than a second it's.. it's just nothing data
Comparing a troubleshooting guide to running a site on a couple of servers is a bit too different for me. Compare it to a troubleshooting guide for those two servers, let's see how they stack up. No using any "ask {specific person}" either
Don't get me wrong, kubernetes is overkill for most side project level things. I don't disagree, I just like to see things knocked down a peg fairly!
Also as mentioned by this tweeter they use more than 2 servers anyway
https://twitter.com/shadowmanos/status/1434980544740306947
They could probably save on resources and maintenance effort if they switched to containers assuming this is still the same or more
> This is #1 in a very long series of posts on Stack Overflow’s architecture. Welcome.
Here's the stats on Stackoverflow
https://stackexchange.com/performance
1.3 billion page views per month, 9 web servers, 4 sql servers
Stackoverflow is notable because they went down the C#/MVC/SQL Server route from the start, which meant much better performance per server. Thats why they make an interesting counterexample to the usual way...
You left off 10 servers: 2 Redis servers, 3 tag engine servers, 3 Elasticsearch servers, and 2 HAProxy servers. So, that's 23 servers in total, which is not a trivial amount, but also not a huge number, either.
What is really interesting to me is that they just have a very small number of each kind. It's not like they have 50 redis servers, it's just 2. 2 looks more like a smaller project ar first, but it's great proof that a lot can be achieved by doing engineering.
They are probably able to do it with little because they understand what they are doing.
There are too many developers and teams that don't understand the underlying principles and limitations of the technology they use. In a good case those developers get schooled by experienced engineers, but frequently they end up in an important position in some big enterprise software company and makes life miserable for many people.
I agree. This also ties back to the whole "a junior developer could clone Stack Overflow," or "I could do that in a weekend" mentality.
Sure, maybe a junior dev can make a simple forum, or you could make a web app that duplicates the functionality of SO at low loads in a weekend. But, when you're serving an average of 500 requests per second, you need to know what you're doing. And, IMO, this "knowing what you're doing" is the difference between knowing how to engineer a system and how to write an application.
> Stackoverflow is notable because they went down the C#/MVC/SQL Server route from the start, which meant much better performance per server
And also notable because everything is under an expensive license, so big performant servers is the cheaper option.
Edit: everything = Windows servers for their .NET app ( apparently in the process of migrating to .NET Core) and SQL Server
License costs are usually per-core, not per-server. It's the same whether you have a single large server or multiple small ones.
It used to be per-socket for Windows Server before 2016, and StackOverflow's infrastructure and architecture predate that.
what do you mean by saying "everything"? just SQL Server?
Technically Windows licences, too, but maybe they migrated to Linux with their switch to .NET (Core).
Indeed, afaik they have been migrating long time ago
2018:
>This is the query pattern that caused StackOverflow.com to go offline yesterday:
https://github.com/dotnet/efcore/issues/13524
but on the other hand their tech stack site says: C# + ASP.NET MVC
instead of ASP.NET Core, but I don't think it is proof of anything.
They are definitely using ASP.NET Core, most likely version 3.1 LTS.
EF Core (like ASP.NET Core) ran on .NET Framework until version 2.1. Everything after that requires .NET (Core).
Updated my comment for clarity - yes, SQL Server and Windows for their .NET app. Maybe they've moved to Linux if they have finished their migration to .NET Core, but even if they did, it's a fairly recent thing.
> Maybe they've moved to Linux if they have finished their migration to .NET Core,
As far as I know they have been using .NET Core (or just EF Core? would be weird to go EF Core + .NET Framework, I guess?) on prod in 2018:
>This is the query pattern that caused StackOverflow.com to go offline yesterday:
https://github.com/dotnet/efcore/issues/13524
So maybe they used some things on Linux and some things on Windows - those that couldnt be ported to .NET Core back in the days
They're using .NET Core, and I think even .NET 5 in the meantime though I couldn't find a clear confirmation for that. But they're still on Windows, looking at Linux and containers for potentially in the future.
License costs are peanuts in traditional enterprise bugdgets, when mapped across the overall costs of developer salaries, contracting, designer agencies, travel expenses,....
Is StackOverflow a traditional enterprise for you? I don't think they have huge travel expenses...
Yes, it's a typical large software/SaaS company.
It's not about the travel specifically, it's that servers and licenses are the cheapest cost of any company when compared to salaries, especially for engineering and sales.
> No using any "ask {specific person}" either
It's worth mentioning that the diagram is explicitly incomplete. The yellow endpoints are fixes but except for "END" the other endpoints are all either "unknown state" (i.e. "I have no idea what's broken") or problems that aren't addressed in further detail like "The issue could be with Kube Proxy" or even "Consult StackOverflow".
I'm not sure what a complete diagram would even look like but I don't think there's any way to infer complexity by looking at them in comparison.
People do tend to cherry pick, don't they? Most of Stack Overflow's workload is returning a blob of html for a given url, to a not-logged-in user. Where that html doesn't even have to be the most recently saved copy.
It's taking one source - look at how these people solved it! - and trying to apply it to others.
SO is relatively simple; it's basically customized forum software which is a solved problem that has been around for decades. A junior dev can build an alternative, and it can be built using tried and true solutions like MySQL + PHP, which are horizontally scalable with database sharding, read replicas, and maybe stuff like memcached to accumulate votes before updating the database or a CDN for caching static files.
Google has different problems and different workloads, and they have hundreds of times more applications with thousands of times more load. Apples and oranges.
> it can be built using tried and true solutions like MySQL + PHP, which are horizontally scalable with database sharding, read replicas, and maybe stuff like memcached to accumulate votes before updating the database or a CDN for caching static files.
> Google has different problems and different workloads
Which of these do you think most organisations most closely resemble?
I don’t think anybody would disagree if you said that you should use Kubernetes for organisations that resemble Google. But most organisations don’t look anything like Google. They look a lot more like Stack Overflow. So the “You don’t need this…” statement holds true for almost everyone.
Stack Overflow is also a) old system built before a lot of current management tech was available openly b) Very single-application centered.
EDIT: To expand on it - they had to manually build a lot of what could be much easier handled today, and thus had no incentive to change later. Interestingly enough, SO has a lot of moving parts distributed over multiple servers, even if all of those servers used to fit into 1-3 racks per DC.
The single vs multiple application centric thing is a point that's not made enough. Google is basically now a company all about spinning up new products, rather than just periodically adding a new piece of tech to an existing product. As a layperson, Kubernetes _appears_ to be about making it easier to spin up (and run) new products, so it feels more appropriate for Google-like org types.
There are also a lot many more applications one might need to run, especially when scaling beyond one-two people running all development on their laptops/desktops and not having a company yet - especially when one doesn't have valley-style funding.
Also these days you might not want to implement login functionality from scratch, or have better logging, monitoring, etc. and that might involve running more applications than just your LAMP stack.
>A junior dev can build an alternative
Of course junior dev can do it, the same way junior dev can make Youtube
it'll work as long as there's less than 100 concurrent users on SO and less than 50 4K 20min videos on youtube
Sometimes I question myself if people posting these "a junior dev can build this product" haven't really worked with systems at scale and all of the myriad of issues that scale brings or if they are being the usual hyper-optimistic-dev that isn't considering much past the proof-of-concept point of a product.
Either way is quite baffling how common this kind of comment is, almost a decade reading Hacker News and it pops up constantly.
It also doesn't matter if a junior dev could do it if they just don't do it
If it's so trivial, go be rich!
I remember this post from gregdoesit [1] as a good example of how much complexity is hidden under seemingly trivial interfaces. It's about Uber's supposedly bloated app, I believe that anyone who has worked at a scale where they had to support different national legal systems in their code would know that maintaining a large-scale product is nothing trivial.
> a solved problem
Except 99% of forums run terrible software that doesn't perform, is not easily usable and won't work right on phones. That tells me it's not a solved problem at all.
recently facing the dilemma of choosing between k8s vs something more basic.
Features that seemed to be advocating for k8s were not server provisionning, but instead :
log management, easy setup of blue/green & canary deployment, not having to restart a vm upon new code deployment, etc...
How would you do those things as easily with other techs ?
This is the proper reason why so many orgs are considering/using something as complex as Kubernetes. It is not such a easy comparison of 10 servers running K8S versus 6 servers running KVM. There is more to it. Once you do config management, safe deployment mechanisms, observability setup, network management, secrets, RBAC, identity mgmt etc the "just a few servers running Linux" setup looks almost as complicated as Kubernetes and you've created a bespoke setup that only you know how to operate. If you go the Kubernetes route, sure there are bells and whistles that are not needed for your use case, but it standardises the operations such that you can hire a new team member and supply K8S documentation to them and expect them to do things in your infra setup.
It is a choice. I have personally moved on from the "Kubernetes is never a good choice over running things yourselves" camp.
Nomad all the way. It's much easier and lighter than Kubernetes, but it does 3/4 of what Kubernetes does. The ecosystem is much lighter but depending on your needs it could be entirely sufficient.
I've written about Nomad vs k8s on my blog if that might interest you:
https://atodorov.me/2021/02/27/why-you-should-take-a-look-at...
And I've also written about some common things, like Traefik for ingress, Loki for logs, etc. to supplement the pretty complete Hashicorp tutorials.
It's all trade-offs. k8s is feature-rich, flexible and configurable but that also means high complexity. Simpler tools might not tick all the feature boxes.
As long as your applications follow 12-factor principles it shouldn't be too hard to move between different orchestration tools and you can pick the one that best suits your needs.
why don't you consider nomad[0] in your evaluations. i think it should fit your requirement.
Can second this recommendation. We're running OpenShift and Nomad clusters - the former makes my eyes bleed, and the latter I can mostly get my head around. (Note that I'm not involved in operating either of those platforms.)
You'll still get layer-upon-layer of abstraction - for example Consul for key-value and service discovery, Traefik for load balancing, Terraform to build up the service discovery rules, etc - but it feels somewhat more intentful, less boilerplate.
aws elastic beanstalk
all what a modern web app needs out of the box with 1 day to learn instead of years... and the best? you don't have to modify your code to work on it, ie environment and code are separate.
And if they don't like your politics, they simply pull the plug.
You mean morally indefensible politics and misinformation propaganda? Oh no. Anyway.
I mean these politics are pushed by rich Americans and foreign interests, surely they have the means to start their own hosting platform. It's the same that advocate in favor of businesses rejecting customers because of free market.
This doesn't seem to be as easily defined as you say. The new AWS group is more or less targeted at "avoid bad press" rather than "indefensible politics", whatever that is. Twitter mobs seem to be quite fickle to me and can just as easily eat their own as their usual fare. News outlets aren't a lot better, often following big enough gripe fests and piling on like the rest.
It's not just based on particular politics. It can be any behavioral screw-up done by you or any employee, at any point in your life, real or perceived, that crosses the current cultural expectations. The surface area of risk is ridiculously huge.
It gives activists another angle of attack on your business. If they can create enough of a scandal, Amazon might drop you to avoid bad press.
I don't use any cloud service that isn't available at other vendors. This isn't my primary reason, but it is on the radar.
I'm not saying Amazon shouldn't be free to do that. Just that you should think twice about trusting them with your business.
No further comment on the specific political content - those sensibilities can change. Can your business adapt quickly enough?
That’s not an architecture diagram though, so it doesn’t represent the complexity at all.
I’m sure a troubleshooting map for bare linux server wouldn’t be less complicated than that.
> I’m sure a troubleshooting map for bare linux server wouldn’t be less complicated than that.
Except your k8s runs on a Linux server, so this is just an addition. (Unless you're using a fully managed k8s cloud offering, but then you have an even bigger toubleshooting flowchart to navigate the provider's management interface: at least that's my experience with GKE, maybe Amazon and others are better)
> Except your k8s runs on a Linux server,
Wouldn't it be more likely in this case that the server is built from configs? Ansible or whatever
The troubleshooting for the Linux server side is "spin up a new one and delete the old one"
100% and one of the great things about k8s is that this diagram applies to essentially any application. Standardisation is awesome.
Proper standardization is awesome. De facto, corp-owned standarization not so much.
Unfortunately as a k8s user in the real world every container is slightly different and has numerous hacks in it to make it compatible with k8s in some way or another. So no.
This is an underappreciated point. I love k8s. I hate the shit people decide to do in k8s.
At a previous employer, we had a k8s cluster with a bunch of machines that were designed to a) load a filesystem kernel module inside the container (yes inside, not outside), b) mount /dev from the host in the container with Docker, and c) mount hard drives from the host /dev inside the container using the "mount" command.
In a twist that should surprise no one, those containers don't work well. And they failed to work in crazy, confusing ways for which there is no documentation to troubleshoot, because who in their right mind would do something like that?
I've had better luck in places that have a Platform as a Service team that owns the k8s infra. They generally have a lot more pushback to say "no, you're not going to do that on our cluster" which helps to tamp down some of the crazier ideas.
To be fair, the steps seem to map pretty well to the number of kubernetes resources you would need to create to do basic things like add a persistent disk, or get traffic to your application.
When I first saw it K8S reminded me a lot of systemd. I wouldn't be surprised if over the next few years each grow the features of the other.
That sounds logical as they kind of perform the same tasks with the difference being that systemd manages workloads on a single system and k8s manages workloads on a cluster.
Interestingly enough, SO is apparently going with k8s a lot...
https://stackoverflow.blog/2021/07/21/why-you-should-build-o...
> StackOverflow runs on a couple of servers.
Does it?
Yes it does: https://stackexchange.com/performance
Pretty impressive I think.
No it doesn’t. From your link:
• 9 web servers
• 4 SQL servers
• 2 Redis servers
• 3 tag engine servers
• 3 Elasticsearch servers
• 2 HAProxy servers
That comes to 23. I know “a couple” is sometimes used to mean more than two, but… not that much more than two.
“A couple” is just flat-out wrong; I’d guess that he’s misinterpreting ancient figures, taking the figures from no later than about 2013 about how many web servers (ignoring other types, which are presently more than half) they needed to cope with the load (ignoring the lots more servers that they have for headroom, redundancy and future-readiness).
One interesting aspect is that the number of servers is much higher than what would actually be needed to run the site, most servers run at something like 10% CPU or lower. Most of the duplication is for redundancy. As far as I remember they could run SO and the entire network on two web servers and one DB server (and I assume 1 each of the other ones as well).
If someone says SO runs on a couple servers this might be about the number actually necessary to run it with full traffic, not the number of servers they use in production. This is a more useful comparison if the question is only about performance, but not that useful if you're comparing operating the entire thing.
IIRC, without emergency redeploying, they might have issue running less than 4 - not sure if the tag server can coexist with web server anymore for example, redis is still a dependency, so is haproxy, separated SQL and IIS, etc.
Then there's support services (iirc, all of elasticsearch was non-functional requirements stuff and technically could be run without?) and HA.
23 is not a lot of servers.
That is still doable with mid-90s era hand management of servers (all named after characters in lord of the rings).
Not that you should, but you could.
And the growth rate must be very low and pretty easy to plan out your O/S upgrade and hardware upgrade tempo.
And it was actually possible to manage tens of thousands of servers before containers. The only thing you really need is what they now call a "cattle not pets" mentality.
What you lose is the flexibility of shoving around software programmatically to other bits of hardware to scale/failover and you'll need to overprovision some, but even if half of SOs infrastructure is "wasted" that isn't a lot of money.
And if they're running that hardware lean in racks in a datacenter that they lease and they're not writing large checks to VMware/EMC/NetApp for anything, then they'd probably spend 10x the money microservicing everything and shoving it all into someone's kubernetes cloud.
In most places though this will fail due to resume-driven design and you'll wind up with a lot of sprawl because managers don't say no to overengineering. So at SO there must be at least one person in management with a cheap vision of how to engineer software and hardware. Once they leave or that culture changes the footprint will eventually start to explode.
Most of that is extra unused capacity. They've shared their load graphs and past anecdotes where it's clear the entire site runs very lean.
Also 23 is very much a couple for a company and application of that size. It's not uncommon to see several hundred or thousands of nodes deployed by similar sites.
Two of their servers have 1.5 TB of RAM each. Just one of those nodes is probably as powerful and expensive as 100 nodes in a thousand node setup.
They aren't magically more efficient than other sites. They just chose to scale vertically instead of horizontally.
> "They aren't magically more efficient than other sites"
It's certainly not magic but good architecture decisions and solid engineering. This includes choosing SQL Server over other databases (especially when they started), using ASP.NET server-side as a monolithic app with a focus on fast rendering, and yes, scaling vertically on their own colo hardware. The overall footprint for the scale they serve is very small.
It's the sum of all these factors together, and it absolutely makes them more efficient than many other sites.
Exactly. That twitter thread is just pure rage based on no data. Sum up resources from that page - we are talking around 6500GB* of RAM worth of servers. That is no homelab.
* Maybe a bit more/less, because it's not clear to me if DB RAM is per server, or per cluster. Likely server, as on other servers. There is also no data on how big is their haproxy.
And yet the main point stand : they don't need K8s to manage this application running on 23 servers.
No one needs k8s. Bringing up their infrastructure in a k8s troubleshooting how-to was a weird thing to do in the first place. It's comparing apples and chandelier - makes no sense.
They have a typical vertically scaled infrastructure, most services have just two nodes, one active. The biggest ones are databases which in many companies are handled in "the classic way" anyway. Clearly it's not designed as microservices and doesn't need dynamic automation at all. Why on earth would they even bring k8s up in their plans?
And yet it wouldn't be out of place either.
Nevertheless, it is true that Stack Overflow has focused on backend performance and scaled vertically a long way, further than is fashionable. Just not so far as only using two servers for everything.
Because they're constrained by Windows and Microsoft licensing, scaling out was never an easy option for them.
I'm curious. I saw a similar comment earlier, surely the surely windows licensing is just a drop in the bucket compared to the rest of the infrastructure costs?
I've not really looked at hosting anything on windows before, do they have unusual licensing terms in such a way that it would be a significant cost?
What constraints? Windows Server licenses are bought per-core and the company can easily afford plenty more. This is a non-issue.
But adding a new server includes having to buy new licenses, which is a consideration you don't have with OSes that under licensing. It costs extra money, and used to be per socket when their infrastructure was conceived.
So what? Licenses are not expensive, especially compared to all of their other costs like the dozens of staff, and paying an invoice isn't complicated. They maintain their own hardware in colocation facilities so they'll get a new license way before they even get the hardware shipped out.
Why does this make scaling out "never an easy option for them"?
>Because they're constrained by Windows
How? didn't they migrate to .NET Core?
Did they ? I must have missed it, but seems so :
https://www.infoq.com/news/2020/04/Stack-Overflow-New-Archit...
That doesn't mean they've moved away from Windows servers hosting it though.
It is impressive, but it's not a raspberry pi kind of setup. Just two of those "couple" are hot and standby DB servers with 1.5TB RAM. That infrastructure is scaled A LOT vertically.
This seems like something you can implement in a workflow tool like Netflix Conductor and we can automate the debugging process with visuals.
Just the pic, archived: https://web.archive.org/web/20210907091921/https://pbs.twimg...
Nomad ftw
This from a guy who sells over engineered ORM goo (LLBLGen).
StackOverflow didn't use that either and instead chose to invent their own query builder/mapper known as Dapper.
> StackOverflow runs on a couple of servers.
K8s can as well.
The difference is a bunch of servers running k8s or a bunch of servers running custom code to duplicate parts of k8s.
Or a couple of servers running IIS with a handful of web apps, maybe a reverse proxy.
And FTP for putting your PHP scripts into production.
I hate when stack overflow is held up as an example of how we can run any system on “a few servers” - stacknoverflow has like 3 features and has an engineering focus on the single goal of performance and keeping on running on the small subset of servers.
Every other project as different constraints.
I think number of features have little to do with the situation.
Nowadays, everybody insists on putting stuff on K8s regardless of how large or small it is.
An application is an application for the purpose of running it on a server. It doesn't really matter how much functionality it has.
It is microservices "revolution" (quotes intentionally) that caused larger applications to be split a lot of small ones and complicated the execution environment to the point that a lot of people spend a lot of time just trying to figure out how to run their applications reliably.
That is not necessary.
If you can have multiple microservices, more likely than not you can have them as separate libraries used by single application or a separate modules of a single application. Just make sure to put the same thought into modularizing it as you would designing microservices APIs and you can have the same but much easier and with much better performance (no serialization/deserialization, no network hops, no HTTP stack, no auth, etc.)
"Nowadays, everybody insists on putting stuff on K8s regardless of how large or small it is."
But if you already have the tooling, experience and support for k8s, why wouldn't you use it?
I can fire up a k8s cluster on any major cloud provider in minutes (or bare metal in slightly longer), and deploy almost any web app using a standard interface that has extremely wide adoption.
K8s supports a ton of things, but you don't have to use them. It can be complicated, but usually it's not.
It feels a bit like saying why use Linux for your simple app when it could run just fine on DOS or QNX. How many years of my life have I wasted debugging networking issues on Linux or Windows that turned out to be caused by a feature I wasn't even (intentionally) using...
> It can be complicated, but usually it's not.
The usual story. Everything works, until it doesn't.
If you are a huge corp with good engineering you can have people dedicated to understanding k8s and then it kinda makes sense. They can spend time to learn it really well so that they have necessary chops to deal with problems when they happen.
On the other hand, if you are smaller company, you are more likely embracing this new idea of developers running everything including k8s, you are in for a trouble.
They will know how to make it work but that's about it.
Because if you need to learn everything you actually learn nothing very well. And there certainly isn't enough time in the world to learn everything in development.
My philosophy is applications must be built for when it breaks and it is unacceptable to run an application with a team that will not be able to fix it if it breaks.
**
Couple of years ago I joined a small group of teams of developers (together about 40 devs) who together maintained a collection of 140 services, all with same or very similar stack (Java, Spring, REST, RabbitMQ).
They had trouble delivering anything to prod because of complex dependencies, requirements, complex networking, complex process to find out where stuff broke between 7 layers of services between the original user call and the actual execution of their action.
I rolled my sleeves and put everything in a single repo, single build, single set of dependencies, single version, single deployment process, single layer of static app servers.
I left after the team was reduced from 40 to 5. There was no problem delivering anything (we had 60 successful deployments in a row when I left) and the guys who were left admitted they are bored and underutilized because of how easy it is to make changes to the system.
These were still the same guys that were advocating for microservices. From what I heard they are not advocating for microservices anymore.
Can microservices be done well? Sure they can. But it takes additional effort and experience to do it well. Why make your life difficult when it is not needed?
"The usual story. Everything works, until it doesn't."
But the same is true of all the other orchestration tools isn't it?
I've had similarly complicated problems with Terraform, Ansible, Chef and Puppet and just plain Linux as I have had with Kubernetes. Meanwhile K8S saves a lot of time when things do work properly - which is nearly always.
A while ago, we had an issue with dotnet where the JIT was sometimes emitting bad instructions and crashing the process. That was an absolute bloody nightmare to debug and reproduce, it took weeks - but nobody would say running a high level language is bad because the compiler might have a bug, right?
We are a small company (under 20 developers), we have one dedicated ops person and one devops, and have never had any issues with k8s that couldn't be resolved by one of them within a few hours. We run a monorepo with 6 app services as part of our core product, 10 beta/research services, then a handful more infrastructure services (redis etc.), and honestly it's been the smoothest environment I've ever worked with. All the infrastructure is defined in code, can be released onto a blank AWS account (or k3s instance) in minutes, all scales up and down dynamically with load, and most of the time something goes wrong it's a bug in our code.
Maybe the problem with your system was more about the excessive use of microservices and general system architecture rather than Kubernetes itself?
> But the same is true of all the other orchestration tools isn't it?
Of course. The difference being how complicated it is to deal with problems.
For example I find it is way easier to deal with problems with Ansible compared to Chef.
So, assuming that both get me what I need, I prefer Ansible because it is less drag for when I have least time available to babysit it (which usually happens at least opportune moment).
What I am trying to say is that, just because it works for you now doesn't mean it will not end in a disaster at some time in the future. It is not my position to tell you if the risk is acceptable for you. But I personally try to avoid situations from which I cannot back easily.
If I have a script that starts an application on a regular VM I KNOW I can fix it whatever may happen to it. Not that I advocate running your services with a script, I just mean there is a spectrum of possible solutions with tradeoffs and it is good to understand those tradeoffs.
Some of those tradeoffs are not easily visible because they may only show themselves in special situations or, opposite, be so spread over time and over your domain that you just don't perceive the little drag you get on everything you do.
I find that if there is any overarching principle to build better solutions it is simplicity.
Presented with two solutions to the problem, the simpler solution is almost always better (the issue being the exact definition what it means to be simpler).
For example, I have joined many teams in the past that had huge problems with their applications. I met teams that were very advanced (they liked overcomplicating their code) and I met teams that could barely develop (they had trouble even writing code, did not know or use more advanced patterns).
I found that it is easier to help teams that had huge problems but wrote stupid code because it is easier to refactor stupid simple code that beginner developers write than it is to try to unpack extremely convoluted structures that "advanced" developers make.
I think similar applies to infrastructure.
For example, when faced with an outage I would usually prefer simpler infrastructure that I know I understand every element of.
I don't think this really does justice to what Stack Overflow does. They're probably the most visited engineering-related site on any given day. They have their widely known Q&A, live updates and notifications, chat rooms, a blog, job listings with email updates, review queues, moderator tools, and so forth. Perhaps you only use three features, but the site does a tremendous amount, and that's not even thinking about all the cross Stack Exchange network functionalities.
Perhaps I was being a little dismissive but it doesn’t have the same features of something like Facebook or LinkedIn or google
Mind you, those companies probably shouldn't have as many features as they have.
> but it doesn’t have the same features of something like Facebook or LinkedIn or google
But it does seem to have most of the features of something like Reddit.
I hate it when people act like every other project has different constraints. For most web projects that is simply not the case. Maybe in some cases there are small differences which won't be noticed by any run of the mill setup. If your constraints are different but you have 100 members and never more (like most startups) your constraints generally don't matter even if they are radically different. Most people here are overarchitecting for the "chance of becoming Facebook"; personally I rather make a lot more profit (from day 1) by using cheap stuff and simple setups and when we accidentally make it to Facebook size, we will revisit. We can move it to Amazon to keep running and in the meanwhile rearchitect. But chances are against us, and all of you here, that we will ever reach anything close to this. Even with investor money (just did 2 raises) we have better stuff to do.
Or maybe we're running a more complex infra not because of scaling, but because declarative setup free from expense of cloud services (esp. important outside of SV/rUSA bubble) can mean that hosting the services that are needed to run the project can be made cheaper, both in time and materiel.
Just like there are people that love to setup an Hadoop cluster for data that fits into a USB pen, there are others that love to ramp up a Kubernetes cluster for what is a typical three tiered application.
This argument is nonsense
Attention! Someone who knows nothing about your project is offering free advice!!