Settings

Theme

Software Infrastructure 2.0: A Wishlist (2021)

erikbern.com

102 points by whoiskatrin 2 years ago · 84 comments

Reader

015a 2 years ago

Here's something very specific I've been thinking about recently.

I think Google Cloud Cloud Run is obscenely ahead of its time. Its a product that's adjacent to so many competitors, yet has no direct competitor, and has managed to drive a stake into that niche in a way that makes it such a valuable product.

Its serverless, but not "Lambda Serverless" or "Vercel Serverless" which forces you to adopt an entirely different programming model. Its just docker containers. But its also not serverless in the way Fargate or ACS is "serverless"; its still Scale to Zero.

There's a lot of competition in the managed infrastructure space right now (Railway, Render, Fly, Vercel, etc). But I haven't seen anyone trying to do what Cloud Run does. Cloud Run has its disadvantages (cold starts are bad; it also could be a great fit for background workers/queue consumers/etc, but Google hasn't added any way to scale replicas beyond incoming HTTP requests yet).

But the model is so perfect that I wish more companies would explore that space more, rather than retreating to "how things have always been done" ("pay us $X/mo to run a process") or retreating to the much more boring "custom serverless runtime", "your app is now only a 'AWS Lambda app' and cant run anywhere else congrats".

  • mrkurt 2 years ago

    (I am bias because I work on Fly.io)

    Fly Machines are more powerful than Google Cloud Run IMO. You can treat them like cloud run, or manage them directly and implement your own Serverless model.

    Our PaaS orchestration is implemented entirely I. The client CLI, and it manages Fly Machines directly: https://fly.io/docs/machines/

  • ajcp 2 years ago

    At the risk of being on the outside here, I'd have to agree.

    To go a bit further I'm honestly quite interested(?) in how CGP has sought to differentiate itself from the two other providers by offering this kind of "plug and play" feel to cloud. Certainly there is value to be gained from the absolutely granular service offerings of AWS/Azure, but there's a point when it starts to feel like all I'm doing is building control towers for island landing strips.

    I just want my cloud providers ML service to talk to the data lake on the same cloud tenant without having to architect my way through 15 network nics, 30 service accounts, and 4 VDI...

  • latchkey 2 years ago

    Cloud Run is great, but imho Cloud Functions are even better. It is just a simple http handler.

    The entire deployment can be easily defined in github actions. Combine that with Cloud Tasks and a Cloud SQL Postgres instance and you have a near infinitely scalable solution.

    I ran a system like this where over 30k servers across 7 different data centers all over the US, were hitting cloud function endpoints 24/7 with 30-50+ RPS and I never had a single failure or outage over multiple years. Even better, the whole thing never cost more than about $100/month.

  • maccard 2 years ago

    Azure has container _instances_ - https://azure.microsoft.com/en-gb/products/container-instanc...

    DigitalOcean iss not wildly far off it either.

    ECS + Fargate is the closest AWS has to it, but you need to do IAM and Networking to utilise it. If you're in AWS already, it's pretty good, albeit with some frustrating limits

    • 015a 2 years ago

      Yup my bad, I meant ACI, not ACS.

      Correct me if I'm wrong, but these are actually not close to Cloud Run. Cloud Run's differentiator is its scaling metric; it scales with incoming requests, and has strict configuration to assert that each replica only handle N concurrent requests. You could maybe get something like this set up on ACI or Fargate, but it'd require stringing together five or six different products. You can also definitely wire up those to autoscale on CPU%, but (1) this is not scale-to-zero, and (2) CPU% kinda sucks as a scaling metric, right? Idk I've never been happy with systems that autoscale on CPU%.

      • jiggawatts 2 years ago

        Azure has mostly implemented this now. ACI was a single instance, but they have scalable Container Apps now. These are just a dumbed down abstraction hiding a managed Kubernetes cluster beneath that you never interact with directly.

    • kastden 2 years ago

      Container Instances is bad though, and you'll regret using it. There is Azure Container Apps but it requires some more setup in advance.

  • lijok 2 years ago

    Maybe I’m misunderstanding something but what you’re describing is what AWS Lambda has been able to do for a long time now. You can run an api in a docker container with no Lambda-specific code.

    • dmattia 2 years ago

      My understanding is that your docker image must have the lambda runtime interface client installed on the image in order to work.

      It's not a huge step usually to add the RIC, but it's a bit more tied in to AWS than CloudRun is, which can run arbitrary docker images, if I understand.

      • lijok 2 years ago

        That's right - you have to package awslabs/aws-lambda-web-adapter into your docker image which proxies the API-GW/ALB requests through.

  • meowtastic 2 years ago

    Isn't AWS App Runner similar?

throwawaaarrgh 2 years ago

The one thing I want, that doesn't exist, and won't for at least 10 years: immutable infrastructure.

Oh, the concept exists. I can make some infrastructure mostly-immutable, myself. But the cloud doesn't give me it out of the box. What the cloud gives me are APIs. If I write software to call those APIs, predict what the allowed values are, predict the failures I might see, write about 5,000 lines of code to handle the failures, attempt to reconcile differences, retry, store my artifacts, reference them, after implementing a build system, etc, I can get one or two things to be immutable. But for the vast majority of services it's actually impossible.

Take an S3 bucket. Can you make an S3 bucket immutable? The objects inside it might be versions, sure. Can you roll back all the objects in the bucket to Version 123? Can you roll back the S3 policy back to revision 22? Can you make it also roll back the CORS rules? Can you diff all these changes and see a log of them? Can you tell the bucket to fix itself back to the correct expected version of itself? Can you tell it to instead adopt 3 new changes, as part of a version of the S3 bucket you tested somewhere else? The answer is "no".

You can fake it, with a configuration management tool like Terraform. But that's as immutable as a file on your filesystem. Any program can overwrite your files at any time; you have to have Puppet configured to monitor your files, and constantly fix the files when they get changed, track the Puppet code in Git, keep your own log of changes, etc. That filesystem isn't immutable, it's mutable! If it was immutable you wouldn't have to use Puppet (or Terraform). And the sad thing is we're all stuck on Terraform, which is actually terrible for a configuration management tool, because it mostly refuses to reconcile inconsistencies (the way every other configuration management tool in history has). It just bombs out and says "Oh shit, that wasn't a change I planned, and you didn't write this HCL code to handle this weird condition, so I'm just gonna bail and not fix this. Good luck getting production working again." Puppet wouldn't stop working if something other than Puppet updated a file. But nobody seems to mind that we literally regressed in functionality, because a company made up new marketing terms for their tools.

Sadly this desired built-in immutability, and the declarative nature of it, won't be built into S3 or other tools for at least a decade or two. They would need to effectively build something akin to K8s just to manage their own components immutably and expose an entirely new API. So we are doomed to do Configuration Management in the cloud, until the cloud starts implementing immutability out of the box.

  • pdimitar 2 years ago

    Yeah, sadly true. While I am not a platform engineer I've witnessed their plight many times and I truly sympathize.

    Now more than ever because I started making an effort to self-host much more than before... the amount of scripts I have to write just to achieve idempotency, nevermind immutability, is staggering, and I am already questioning my approach. Will likely start making use of ZFS or BTRFS snapshots, or I don't know, I'll just start snapshotting manually the entire filesystem on my Linux machines (like store all dir/file paths with their sizes and modification dates; it's a start and you can diff against such "snapshots").

    I am just not comfortable with running commands and not having an idea what and where changed. It's insane that everyone is just accepting this! I am not okay with it, I want to see an exact breakdown on what changed and where and how.

    IMO working on this and bringing it to the mainstream is loooong overdue.

    • throwawaaarrgh 2 years ago

      I think it's that few people can see its potential. When I first started using immutable infra like 10 years ago, and saw how many problems it solved, my mind was blown. Until I saw the difference myself, it just looked like some trivial CS concept.

      It's not apparent that problems X, Y and Z will be solved by immutability. Once it's applied everywhere, whole classes of problems just disappear. But until people see the problems disappear, they won't implement it. Catch-22.

      • pdimitar 2 years ago

        True, plus not many devs are directly exposed to the problems and thus the will to fix the problem never has a chance to materialize.

        One of the best-oiled teams I was in had devs and sysadmins work together closely. If Jim made a huge Python mess out of its small throwaway project (that the CEO needed because he wanted a nice chart for an investor meeting) that required several virtual environments and a particular (older) version of something then the sysadmin had the power to call him out and question his methods. While not many programmers appreciate that, those that do make for a more positive workplace IMO.

        RE: idempotency / immutability in general, I heard about Nix many times but I have been put off every time I tried it: cutesy (and rather dumb) terminology like pills and flakes and such, a Haskell dialect the world really did not need, tight binding between things (forgot which at this point, sorry), and the list kept growing until I just gave up. With all their quirkiness and edge cases my scripts still beat the pants off of Nix for my own goals. I mean, pacman/yay have a flag that says "only install this package if not already installed" so... ¯\_(ツ)_/¯

        But I really do want something like Nix (and no, not Guix either). Not only for packages -- for the entire system. I want to be able to plug an USB drive and issue a command that says "show me new devices plugged in the last 5 minutes, or last time I checked".

        We don't have stuff like that. Or if we do, I am blissfully unaware of it. Can't we just start writing them and push their adoption? Every sysadmin team invents magic from scratch. Surely we can and should collectively do better...

socketcluster 2 years ago

I built a serverless SaaS no-code/low-code platform which could be of interest: https://saasufy.com/

You can build your entire app inside a plain HTML file which can be deployed online with something like GitHub pages.

I've built a few apps with it including a real-time chat app which supports both group chat, private 1-on-1 chat with an account system (with access control), OAuth via GitHub... The entire app is only 260 lines of HTML markup and fully serverless (no custom back end code). Access controls are defined via the control panel. All the app's code is in this file: https://github.com/Saasufy/chat-app/blob/main/index.html

You can try the app here (use the 'Log in with GitHub' link): https://saasufy.github.io/chat-app/index.html

Saasufy comes with around 20 generic declarative HTML components which can be assembled in complex ways: https://github.com/Saasufy/saasufy-components?tab=readme-ov-...

There is a bit of a learning curve to figure out how the components work but once you understand it, you can build apps very quickly. The chat app only took me a few hours to build.

I've also been helping a friend to build an application related to HR with Saasufy and I managed to get the basic search functionality working with only 160 lines of HTML markup.

fhuici 2 years ago

> The speed that's not there is setting up infrastructure. If I make a change in the AWS console, or if I add a new pod to Kubernetes, or whatever, I want that to happen in seconds. I'm not asking for milliseconds!

Milliseconds is now possible: https://kraft.cloud/ (e.g., an NGINX web server in under 20 millis).

  • thundergolfer 2 years ago

    Cool looking website :) Small nit feedback, you say "less servers to operate" when it should be "fewer servers to operate" because servers are countable.

  • fbergen 2 years ago

    But you still have clusters, why not everywhere… ?

mike_hearn 2 years ago

It's a sub-component but Oracle Labs has a project to develop something like the FaaS platform he's asking for, called GraalOS.

The basic idea is that FaaS is a leaky abstraction because (a) lots of runtimes are slow to start up and (b) isolation tech isn't good enough. So FaaS services start up VMs and containers and then the user's function which might have to do a lot of init work, like to load reference data, and because that takes too long you have to keep idle capacity around. At that point the abstraction is broken.

So there's a two-part fix:

1. For Java users, the GraalVM native-image tool can pre-initialize and pre-compile a JVM app so that it starts up instantly (including with pre-loaded reference data).

2. Change the isolation model so VMs and containers don't need to be started up anymore. Containers alone can take hundreds of milliseconds to start.

There's also some interesting stuff there that takes advantage of Oracle Cloud's more "edgey" nature than other clouds, where it has more datacenters than others (but smaller).

The new isolation model works by exploiting new hardware features in CPUs that allow for intra-process memory isolation (Intel MPK) combined with hardware-enforced control flow integrity. This requires compiler support, but GraalVM knows about these features and so the cloud can just compile JVM apps to native for you. And what about other apps? Well, many languages run on GraalVM via Truffle, so those are covered (e.g. JavaScript) and for native code you can use a modified LLVM to compile and then do a static verification of any user supplied binaries, like NaCL used to do.

If you put those things together then starting user code that's already available locally becomes just mmapping a shared library into a process, which is extremely fast. It can only exit the hardware/software enforced isolate by going via a trampoline that's equivalent to a syscall, but without needing an actual syscall. The Linux kernel isn't reachable at all.

With that you can have functions that start and stop in milliseconds.

sethkim 2 years ago

What's cool is that Erik actually acted on these complaints. Modal is, by far, my favorite developer tool ever and makes me hopeful not just for the future of software engineering but the entire tech industry.

If you're a naysayer in the comments, I would encourage you to go give it an honest try, and consider again why you think infra has to be done in harder ways.

friedrich_zip 2 years ago

I get where he is going with this... but idk. Feels like a somewhat mid take. Strong abstractions always means strong vendor lock-in and more power to infrastructure providers. But AWS, Netlify and whoever runs your apps are not your friends. Vertically integrating your infrastructure can be a pretty good source of cost reduction and can create interesting assets if you have good talent in-house. So idk... sometimes the fact that building something takes time and you have to think about how you are going to set it up actually is a good thing, because you take the time to build it right and you end up understanding how everything works together.

cheptsov 2 years ago

I have a lot of respect for Erik and his work with Modal, which I've heard a lot of good feedback about. What Erik says about serverless and code over configuration can benefit many users and companies. However, I strongly disagree on the main points and certainly have a different wishlist for infrastructure. My main point would be on that list – open-source and vendor-agnosticism.

Finally, I believe simple configuration can coexist with code.

P.S.: At dstack, we are building an open-source platform to manage AI infra – a more lightweight and AI-friendly alternative to Kubernetes.

mdaniel 2 years ago

(2021) and at the time: https://news.ycombinator.com/item?id=26869050

pnathan 2 years ago

One my basic design philosophies is I learn key things deeply, and fit them together, without layers of "make it easy" tools that introduce incessant XY problems and integration issues.

If something is "magically" easy, it either is a meaningful design/algo revolution or it overpromises the production case while showing off the trivial. Most of the time it's #2. Docker was #1.

samsquire 2 years ago

I enjoyed this post, thank you.

I'm encouraged by the same ideas.

Sometimes you just want something that stays running and doesn't go down and can scale to zero and scale upwards, ideally with revenue.

I kind of want a special mega HTTP form endpoint which I can define a pipeline from, that can go to database and cause background jobs and goes into a mega API automatically.

fbergen 2 years ago

I would love to have what we were sold as a “truly” serverless (even though the name doesn’t mean no server)

- CloudRun did a good job, but the autoscaling is too slow to not pay for idle

- Lambda is great, but I want to run way more complex workloads than simple functions

  • pdimitar 2 years ago

    Somebody mentioned Google Cloud Functions and I instantly bookmarked the service to check it later. Looks to be a pretty solid deal.

  • fbergen 2 years ago

    Am I asking too much? =P

shayarma 2 years ago

So true. why are we still paying for idle resources in 2024?

swyx 2 years ago

we recently interviewed Erik and touched on this list: https://www.latent.space/p/modal

and how Modal exemplifies a lot of the ideas he's been looking for. check it out incl our show notes!

thecleaner 2 years ago

These are bad ideas. They are software wishes which no enterprise will pay for. Infra is setup once so optimising for setup time doesn't do the trick. Rollouts should take time deliberately so faulty software don't lead to an outage in seconds. No infra provider will bother turning off infra as again it can have impact on availability. AWS is optimising resource usage anyway barring a few services like Cloudwatch

PaulDavisThe1st 2 years ago

Within a few lines of each other in TFA:

> We are, like what, 10 years into the cloud adoption? Most companies (at least the ones I talk to) run their stuff in the cloud. So why is software still acting as if the cloud doesn't exist?

> As in, I don't want to think about future resource needs, I just want things to magically handle it.

'nuff said.

  • ljm 2 years ago

    The cloud is so expensive for most companies that I think that a solution architect's insistence on setting up in the cloud by default is actually a corporate welfare program where VC funds are redirected to Amazon and Google.

    That said, it's still not as trivial as using managed SaaS but it's still easier than ever to basically spin up your own cloud of sorts, using the wealth of open source tech out there. K3S on Hetzner can do a pretty solid job for cheap. In that sense, the ecosystem around running your own cloud is only improving.

crabbone 2 years ago

If I didn't know better, I'd think I'm reading one of those cheesy LinkedIn advertorials... To someone who dedicated their professional life to infrastructure all of these wishes read mostly irrelevant, with a strong proprietary advertising flavor. At every turn of a sentence I expected to find a mention of some commercial product this article was going to promote. Well, at least it doesn't seem to do that, not openly anyways.

So, here are some thoughts on what seems to be the key points of the article:

* I want to go fast.

Well... yeah, sure, why not... but it's not very important. Lots of other goals will overshadow this one. Also, if we are talking in the context of whatever-as-a-service, there's very little incentive to work on the speed aspect as long as it not taking ages.

Also, reducing infrastructure to whatever-as-a-service is seriously hollowing the definition. I've been in ops / infra for over a decade, and I've barely even touched the as-a-service aspect. Also, whenever I do come in contact with it, it's always awful, and I want to get away from it as fast as possible. Making it go faster won't help that though. The disappointing parts are poor documentation, poor support, proprietary tech. overly narrow scope etc.

* Testing in production

Why is this even a relevant issue?.. Anyways. OP needs to take a trip to the QA department. They obviously don't know why they have one. But it's also possible their QA department is worthless (ours is...) But having a worthless QA department isn't really something to wish for in Infrastructure 2.0. I don't see how this is a good goal.

So, the reason why QA department is necessary, and why CI can possibly cover only a fraction of what can be / should be done with testing is that QA, beside other things, needs to simulate plenty of different possible conditions in controlled environment to be able to investigate and to diagnose problems. Most of the work of QA is spent on RCA, and then figuring out how to present the problem, stripped of all unnecessary components to the development team to be able to fix it. It's not possible to do good QA w/o an ability to isolate components which calls for creation of fake / artificial environments which are not like production.

* Calls to unleash the next order of developer productivity

This is such an MBA b/s... Just give it a break.

  • pdimitar 2 years ago

    > Well... yeah, sure, why not... but it's not very important. Lots of other goals will overshadow this one.

    For you. For me having to tinker with a repo full of YAML files just to have a Kafka topic provisioned (like it just happened to me this week) can and has killed motivation to the point of not working at all after, for a day or two.

    This stuff should be blindingly obvious, to the point a trained monkey should be able to do it.

    I have the feeling that many agents are working against such a goal though. Vested interests and all.

    You even kinda sorta agree with me by qualifying your statement with this, right after the previous quote:

    > Also, if we are talking in the context of whatever-as-a-service, there's very little incentive to work on the speed aspect as long as it not taking ages.

    Maybe to me time_it_should_take == X and to you X times 3 is fine, but in the end the brain schemata is the same: have it take LongEnough™ (subjective value) and the person responsible simply checks out mentally.

    If I were a CTO or an IT manager I'd be very worried about stuff like this.

    > But having a worthless QA department isn't really something to wish for in Infrastructure 2.0. I don't see how this is a good goal.

    This is IMO not at all related to the article, nowadays QA depts are removed either because leadership wants to save money or because iteration would grind to a crawl, and many businesses need the next feature the next Wednesday. Nothing to do with infra management I'd think.

    Though don't get me wrong, QA is hugely important per se. But I wonder if proper end-to-end automated frontend testing (e.g. with Playwright) won't eventually make them truly extinct. Who knows. I don't.

    > This is such an MBA b/s... Just give it a break.

    I'll always despise MBA speak but the point of programmer productivity is important. I have no problem churning out features and fixing bugs but give me a slow bureaucratic process and you'll find out what it's like to pay a salary to somebody who pushes to the GitHub repo 5 times a month with diffs like +30-20.

    • crabbone 2 years ago

      > For you. For me having to tinker with a repo full of YAML files just to have a Kafka topic provisioned

      This is understandable, but this isn't about speed. Many YAML files may result in high provisioning speed or low provisioning speed, after all they only give instructions to the program doing the provisioning.

      You could legitimately complain about choice of YAML as a platform for infrastructure configuration so several reasons, like:

      1. Not having a built-in ability to describe templates. Lots of infrastructure wants to have some sort of polymorphic configuration, and when the infra developers chose YAML to configure it, they didn't account for that. So, instead they use various template engines that strap on this polymorphism on YAML. This was also indirectly mentioned by OP.

      2. Poorly structured, especially when it comes to large configuration size. It's easy to accidentally write something you didn't intend. It's hard to search.

      3. Being JSON in disguise, it inherits a lot of problems from JSON. Marshaling richer type / structure of data in and out of the program is severely impacted by the primitive and inflexible type system of the format.

      But, again, this isn't speed. This is just a different set of problems.

      > If I were a CTO or an IT manager I'd be very worried about stuff like this.

      Practice shows this is mostly irrelevant. It's hard to reach the point where provisioning speed starts to hurt so much it impacts business decisions. For instance, provisioning in MS Azure is on average twice as slow as it is in AWS. (And deprovisioning is probably four times as slow.) And nobody cares. So many other concerns will overshadow this particular aspect, that you'd feel uncomfortable to even bring it up, if you had to choose between two service providers. Primary driver is cost of running the infrastructure for a long time, overall as a system. Starting time does contribute to the total, but unless your business requires very frequent allocation and deallocation of resources, this won't make a difference. Also, cloud vendors don't bill you for the time that the infrastructure is being brought up, so, it's really hard to make a compelling case to choose the fast-to-provision infra over the slow one just based on that aspect alone.

      • pdimitar 2 years ago

        > Practice shows this is mostly irrelevant.

        I'd dispute this, though I don't have data. To me the problem of people just phoning it in and collecting FAANG salaries is pretty nasty and seems like it's not solvable.

        But yes, I do agree that if the economic analysis SEEMS TO point at the idea that X times 3 effort for provisioning is irrelevant to the bigger bottom line then yes, it seems that the need for action does not exist.

phrotoma 2 years ago

> You know how crappy software is crappy in ways that are so blatantly obvious to the user that you wonder why it was released?

It has crossed my mind several times recently that I want a word to describe this exact state of affairs. Where a thing has a defect so blatant that it is evident to any user that the creator of the thing has never tried using it.

Eg. an airbnb with no towels in it.

What's the word for this situation?

  • JSR_FDED 2 years ago

    Microsoft Teams

  • fbergen 2 years ago

    Yet still people are using it?

    Otherwise it’s called an MVP and a promise of plugging the holes

    • crabbone 2 years ago

      It's overly naive to think that people who use such a tool choose to use it.

      In many cases it's "you are hired into this job, this is the tool we give you, if you don't like the tool, take a hike".

      Even more so, a lot of software is developed not to be competitive, but to be exclusive. It's a lot easier to be the only choice for doing something than trying to compete with a different tool. I've seen countless examples of tools developed in exactly this paradigm, where the decision to use the tool wasn't made by anyone anywhere close the users of the tool (eg. hospital procurement department buying a PACS or a large avionics company ordering a custom-made budget-management program).

    • pdimitar 2 years ago

      Come on now, you were never told in your career, not once, "we use Microsoft Teams for communication here"? Ever?

      Most crappy software exists because of inertia and corporate policies. If people truly had a choice stuff like MS Teams could be phased out by the end of the next quarter.

  • dkasper 2 years ago

    Fugazi is my favorite word for it. Also snafu.

  • PaulDavisThe1st 2 years ago

    A tool with more than one way to use it?

  • crabbone 2 years ago

    I want to expand on this :)

    When I have to describe to people who don't work with me my interactions with developers (especially of the crappy code like that) from a standpoint of someone who represents the QA side of things... I describe to them my interactions with my five y.o. son:

        Me: How as school?
        Son: Goooood!
        Me: Did you behave?
        Son: Yes!
        Me: Did the teacher send you into timeout?
        Son: Yes...
        Me: So how come?  You told me you behaved...  What did you do?
        Son: Played with Ryan!
        Me: That doesn't seem like a good reason to send you into timeout.
    
    And we go like this until I either discover that he was yelling in class or I will never know the reason why he was in detention. This is also the pattern of denial I very frequently face when talking to the programmers who wrote the crappy code. Somewhere on the back of their minds they understand that they screwed up, but they will come up with all sorts of concocted reasoning to pretend that they either don't understand why the product sucks, or they would claim that it cannot be made any better, or attack me for not understanding how the product is supposed to work etc. The most recent example would be (in slight adaptation):

        Me: I discovered that we set PYTHONPATH variable when loading a (Tcl) module.
        Dev: I see no problems with that.
        Me: The new feature we are releasing to the users is conda support.  Conda will not work (well) when this variable is set.
        Dev: Did the documentation tell users to load this module?
        Me: No, but it's obvious that users would like the functionality provided by the module in addition to using conda.  They are made to complement each other.  Besides, documentation doesn't say they shouldn't.
        Dev: (summons PM)
    
    And then PM continues in the same spirit as the developer. And, my guess is that the reason for it is that nobody really wants to work too hard. There's no reward in making a better quality product if that quality isn't immediately appreciated. Features like latency, throughput, size etc. are immediately visible to the user and are an easy sell. Features like internal consistency in the face of more sophisticated usage: these might never happen, and the user might never know that they were protected from their system collapsing on them by a substantial development effort. So, commercial companies de-prioritize quality. And that's how we get crappy programs.
    • pdimitar 2 years ago

      > And, my guess is that the reason for it is that nobody really wants to work too hard.

      There is certainly a lot of that but it gets even worse: you get actively punished for doing good work in many companies: you end up making other people work like asking managers around for product requirements (that are of course barely written somewhere, if at all) or reminding that sysadmin that they half-arsed the job of the deployment and now must add another k8s resource, or asking another dev why did they do X with the Y library... you want to make sure not to screw something up but you just end up annoying them.

      And sadly these things get brought up on meetings. And many over-zealous managers will scold you because they don't like the boat rocked (even if they would actually welcome their initiative; but that assumes they'd have made an effort to understand the situation which is not a given).

      It's no surprise that many talented people just end up checking in, doing the bare minimum, and clocking out. The equation is extremely easy to solve: "work X*3, get scolded, don't get promotions, accumulate hostility in colleagues" vs. "work X and have peace and quiet".

      • crabbone 2 years ago

        Haha. Yeah. I almost got fired in my first month because I asked another developer something I thought was really innocent: they mixed some code from pytest with unittest (two competing Python unit-testing libraries) where either one or the other could do the job perfectly fine. So, I naturally asked why'd they do it. Not even being mean. They, of course, interpreted this as me being snarky... complained to the management, and I had to look for another department to house me.

        Now I write "sorry" and "excuse me" when I get assigned to review someone's code and I mostly fix typos in the comments. But, even so, I don't get assigned to code reviews all that often :)

        • pdimitar 2 years ago

          Yep, had several similar such occurrences and my mind is boggled every time. I personally welcome any opportunity to learn and improve, but many people apparently don't.

    • Kon-Peki 2 years ago

      Not knowing anything other than what you wrote, it sounds like your organization has leadership problems. People don't know why their job exists, they don't know what your organization is actually trying to accomplish, how any individual person fits into it, why the day-to-day things someone does helps, etc.

      Nothing anyone does with software will help.

      • pdimitar 2 years ago

        > it sounds like your organization has leadership problems

        That's like saying "sounds like the Sun is going to rise tomorrow again". Most companies have leadership problems, it's kind of ingrained in Homo Sapiens to fight for a cozy position and then become a gatekeeper of their own mediocrity, to the detriment of their rulers.

      • crabbone 2 years ago

        Yeah, but the most disappointing part is that you cannot say it out loud. Even acknowledging the problem will have all the swords pointing at you. So, everyone quietly hates their pointless tasks, but completes them somehow anyways. And things only get worse :) But, there's no reality check because there's no competition and hasn't been for such a long time that users are essentially led to believe that what they are getting is the best they can get.

stavros 2 years ago

This isn't substantive, but it bugged me:

> I'm not asking for milliseconds! Just please at least get it to less than a second.

What do we measure "less than a second" times in?

zsoltkacsandi 2 years ago

The author apparently does not have any experience in building systems/infrastructure.

> I can set up a static website in AWS, but it takes 45 steps in the console and 12 of them are highly confusing if you never did it before

Anything can be confusing/takes time if you never did before. Getting productive needs time and practice. If your goal is only to set up a static site, AWS is an overkill for it.

> It's sad this is the current state of infrastructure.

It’s sad that some people still haven’t learned to pick the right tool for a problem.

> I could go on, but I won't. I'm dreaming of a world where things are truly serverless.

I don’t even understand what the author wants here. There is no such thing “truly serverless”. Your code will be executed by a server. Period. Serverless is just a fancy marketing term for ephemeral lightweight VMs.

> If I make a change in the AWS console, or if I add a new pod to Kubernetes, or whatever, I want that to happen in seconds

The author obviously doesn’t have any knowledge about distributed systems.

> My deep desire is to make it easy to create ephemeral resources. Do you need a database for your test suite? Create it in the cloud in a way so that it gets garbage collected once your test suite is done.

Fortunately we have Terraform that’s made this possible for a decade(?).

> Code not configuration

Terraform, Pulumi, countless of client libraries for all of the cloud providers.

  • dang 2 years ago

    Can you please not post in the flamewar style to HN, as you did here and elsewhere in this thread? You can make your substantive points without that. We're trying for a different kind of discussion.

    If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.

  • samuell 2 years ago

    > The author apparently does not have any experience in building systems/infrastructure.

    Well, he built https://modal.com , one of the coolest things since sliced mangoes, and before that https://github.com/spotify/luigi

    • zsoltkacsandi 2 years ago

      I don’t care what he built if he justifies his arguments with distorted facts and complains about lack of things that have been around for a decade.

      • phillipcarter 2 years ago

        When you make statements like this:

        > There is no such thing “truly serverless”. Your code will be executed by a server. Period.

        It indicates that maybe you are the one who's missing the point. The author is not saying anything about wanting code that magically runs on a server without running on a server.

  • kitd 2 years ago

    There is no such thing “truly serverless”. Your code will be executed by a server.

    This is nit-picky. "Serverless" refers to the "dev", not the "ops", and has done for a while.

    Fortunately we have Terraform that’s made this possible for a decade(?).

    Setting up production-grade DBs in Terraform is easy?

    • szszrk 2 years ago

      If done perpetually - yes.

      The autor does make some weird arguments and seem to be creating an emotional setting for something. Like his own product you guys mentioned.

      My pods ARE ready in seconds. Wondering why his are not.

      • cassianoleal 2 years ago

        > My pods ARE ready in seconds. Wondering why his are not.

        That's what I was thinking too. What kind of underpowered, crappy k8s cluster is this person running where pods take minutes to spin up?

    • zsoltkacsandi 2 years ago

      > Setting up production-grade DBs in Terraform is easy?

      Oh, yes, it is. Setting up the resources actually the easiest part, most of the problems originate from the phenomenon that as the developers starts to use more and more "serverless" things, they know less about how the underlying technology works, how to use indexes, structure the database, how replication or transaction works. Production readiness is not just how a resource is configured. It is about how the application uses a resource efficiently.

      > This is nit-picky. "Serverless" refers to the "dev", not the "ops", and has done for a while.

      There is no "dev" and "ops" serverless. Your application will run on one or multiple CPUs, will use the memory, the disk, the network. When you write the application all of these matter, memory management, network communication, CPU caches, parallel execution, concurrency, disk access. It does not matter if you call it serverless, cloud, bare metal, etc. The basics are the same.

      • jasode 2 years ago

        >There is no such thing “truly serverless”. Your code will be executed by a server. Period.

        >Your application will run on one or multiple CPUs, will use the memory, the disk, the network.

        But the term "serverless" has never meant "serverless does not run on cpu, does not use any RAM, and does not use disk or network."

        You're attempting a clarification for "serverless" that nobody needs because reasonable people didn't actually think serverless/LambdaFunctions/CloudWorkers/etc defied the laws of physics.

        "Serverless" from the beginning has always meant not having to do "os management/operations" type of tasks in a vm such as:

          sudo apt-get update
          sudo apt-get install <package>
          [...]
        
        Instead, the cloud vendors created ability to run stateless functions which are executed in a "cloud runtime". The "dev" focuses the effort on coding the stateless functions instead of Linux os housekeeping tasks.

        And yes -- to pre-empt the discussion from going around in circles... the "cloud's runtime" for stateless functions do ultimately run on a "server" which runs on cpu/memory/disk. And yes, "the cloud is just somebody else's computer". I think we all know that.

        • zsoltkacsandi 2 years ago

          > "Serverless" from the beginning has always meant not having to do "os management/operations" type of tasks in a vm such as

          So you mean that serverless is when someone else types in the commands of installing the dependencies of your software.

          I am genuinely curious, how difficult/expensive learning and issuing these commands on a VM, putting them into a packerfile, Dockerfile or ansible playbook, considering the whole software development lifecycle?

          In your interpretation the serverless is when the person who runs these “Linux housekeeping” commands is working at AWS (or insert any other provider here) and not at your company.

          • mike_hearn 2 years ago

            Serverless/FaaS takes care of the following things that you otherwise need to do yourself:

            1. Provisioning VMs and copying the right files up to them.

            2. Linking them together behind an HTTP load balancer, which itself needs to be on one or more VMs and possibly DNS balancing.

            3. Configuring that load balancer to respond on HTTPS endpoints and health check backends.

            4. Collecting logs etc to a central place.

            5. Making sure servers restart if they need to for versioning or crash reasons.

            6. Shutting it all down and cleaning it up if you stop using them.

            That's pretty much it. People like it because doing UNIX sysadmin work sucks. The usability just isn't very good.

  • evantbyrne 2 years ago

    I built a CD for AWS (beakerstudio.com). The author is correct about everything being super complicated. Tools like Terraform help automate changes, but you still have to _learn_ all of the strange ways AWS works and juggle configuration requirements that are oftentimes so bizarre it makes you wonder if they are trying to funnel developers into support plans.

    Honestly, the experience of building Beaker Studio made me bearish on AWS. They price gouge and the DX is so bad teams pretty much need CDs. Once I get the time I want to update Beaker Studio so people can deploy to any old Linux box instead. Teams deserve so much better than AWS/Google/Azure.

    • abi 2 years ago

      Been looking at a few solutions similar to yours! I'm currently on Render and looking to move elsewhere so I can have more control and particularly insight into system metrics. Do you support zero downtime deploys? It wasn't clear to me from your home page.

      • evantbyrne 2 years ago

        Tasks are run on ECS with Fargate. If you setup your server with a load balancer, which is required on ECS to point DNS to the server, then the load balancer will wait for health checks to pass before switching over to the newly deployed tasks. ECS with Fargate is reliable in my experience, and Beaker Studio uses an alternate installation of itself to deploy itself, so everything is dogfooded. A big drawback imo is that AWS is expensive and Beaker Studio does not attempt to hack its way around their pricing. Right now I'm not billing users (within reason) who provide feedback, so please feel free to sign up and email me your notes.

        • abi 2 years ago

          Thanks for clarifying. I'll send you an email once I try it.

  • nkohari 2 years ago

    Just because you disagree with someone doesn't mean they don't know what they're talking about.

  • sciurus 2 years ago

    > I don’t even understand what the author wants here. There is no such thing “truly serverless”

    The author says what they want. It's literally their next sentence:

    "As in, I don't want to think about future resource needs, I just want things to magically handle it."

    and they have four bullet points with examples of what this means to them earlier.

    I think it's fair to argue about the desirability, achievability, etc of this. I don't think it's fair to act as if the author is just spewing buzzwords without explanation.

    • zsoltkacsandi 2 years ago

      Let's see:

      - Why do I have to think about the underlying pool of resources? Just maintain it for me.

      - I don't ever want to provision anything in advance of load.

      - I don't want to pay for idle resources. Just let me pay for whatever resources I'm actually using.

      - Serverless doesn't mean it's a burstable VM that saves its instance state to disk during periods of idle.

      This article was written in 2021.

      AWS Lambda was introduced in 2014 that fulfilled all of those requirements in those bullet points that you mentioned. Google App Engine is the same, it was introduced in 2008.

      So again, this article tells only one thing: that the author does not know what he is talking about.

  • whoiskatrinOP 2 years ago

    I think it's important to understand that this was his opinion in 2021. Things have changed since, and hopefully, all these solutions are available now.

    • zsoltkacsandi 2 years ago

      TBH, in 2018 I already used the things he was complaining about, so my opinion still stands.

  • lostmsu 2 years ago

    I would not agree. Author basically describes Google App Engine.

  • opentokix 2 years ago

    He is from spotify, he dont have any experience full stop.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection