I got pwned by my cloud costs
troyhunt.comDon't put Cloudflare in front of a Cloud egress bill. i.e. don't do this: Azure|Amazon > Cloudflare
Always use your own proxy where the egress is well within your free tier, i.e. do this: Azure|Amazon > Hetzner|Linode > Cloudflare
Why?
Because Cloudflare cache is a massively multi-tenant LRU cache and whilst hot files will be cached well (and with Cloudflare Tiered Cache even better - but this itself is a cost) anything else is still going to expose you to some degree of egress cost.
When I exposed AWS to the web I paid $3k per month to AWS. With Cloudflare in front of AWS I paid $300 per month to AWS. With Linode in front of AWS and behind Cloudflare I paid $20 per month to Linode and about $12 per month to AWS.
A Linode, Hetzner instance... or any other dumb cheap web server that comes with a healthy free tier of bandwidth is all you need to set up a simple nginx reverse proxy and have it cache things to disk https://docs.nginx.com/nginx/admin-guide/content-cache/conte...
Or simply use a proper CDN that doesn't pretend to eat all the cost for a flat fee but then sometimes does not. BunnyCDN has an amazing volume tier at half a cent per GB.
Oh exactly that.
Or if caching is your biggest priority then Fastly or Akamai will shine too.
But if you're balancing all considerations and want the cheap "good enough" caching with the DDoS protection, free TLS certs, and unmetered (assuming you aren't imgur or something)... then Cloudflare does a great job at being good enough. And for those sharp edges... drop in a proxy of your own, or layer your CDNs.
I don't understand, what is the advantage of Cloudflare over Fastly or Akamai if caching is not your biggest priority? Does Cloudflare have better DDoS protection, or something else?
Yes among other things. Also edge compute, etc.
Fastly comes close on a lot of fronts (and does better at a few things) but unless you are godlike with Varnish scripting it's a lot harder to make it do what you want than Cloudflare.
Thanks this is really helpful, I'm planning a delicate migration to a CDN and it's a tough choice. Cloudflare just seems like "an everything machine" from their marketing website, I'm struggling to understand how I would actually use it for a monolithic website + API.
It is pretty much an "everything machine". I think they are positioning themselves as a platform alternative to having a separate origin + CDN. i.e you develop your entire application with Workers + Durable Objects + R2.
As a CDN it's pretty great though, I have managed very very large properties behind Cloudflare and they have always gone above and beyond for us when big DDoS have came our way.
OPs use case is a couple giant zip files. Edge compute is real cool, but not something a lot of people need when they think of CDN.
I interpreted the question as "if caching -isn't- your biggest priority". i.e what does it do better assuming giant zip files isn't the main thing you are interested in.
So I'm not really responding to OP but rather the commenter I replied to. :)
In this scenario are you saying
AWS/Azure > BunnyCDN > Cloudflare?
Or just straight AWS/Azure > Cloudflare?
Will BunnyCDN reliably keep an 18gb file in cache without hitting origin? I use and like Bunny, but relying on that to not get a massive bill in the mail scares the shit out of me.
they also have storage feature, so they could
Azure has its own CDN. If one wants to do Cloudfare -> CDN -> Azure Storage, then at least let it be Azure CDN in the middle, not another cloud provider in the mix. ¯\_(ツ)_/¯
Or simply run everything on your own server. All those middlemen are going to kill any latency improvements you get from anycast edge servers.
I've switched to Backlaze B2, which has a bandwidth alliance with Cloudflare. Even without it, B2 egress is something like 1/5th of S3, so may be worth thinking about.
If you use argo caching on Cloudflare, it should reduce origin server load even more. Essentially, instead of going directly to your origin, cloudflare endpoint will first reach to it's root node to see if it's cached there and only that node is allowed to communicate with your origin. I see like ~95% cache hits with that turned on.
Argo does not affect caching, only performance. You're maybe mistaking it for tiered caching or a custom caching topology.
Yes, they call it Argo Tiered Cache under Caching tab.
> Azure|Amazon > Hetzner|Linode > Cloudflare
Why not directly Hetzner|Linode > Cloudflare?
Because Hetzner and Linode VPSs have fixed disk sizes, while Azure and AWS have basically infinite storage. You use your cheap commodity VPS as a cache, not a source-of-truth.
Many of them have managed object storage services as well. OVH[1] and Linode[2], scaleway[3] have them, that should scale for most use cases and are S3 compatible APIs
Also Azure and Linode, Scaleway Backblaze and others are part of Cloudflare bandwidth alliance [4] so there shouldn't be egress fees between the two.
It is really only AWS which is a problem, you don't need this setup with any other provider.
[1] https://www.ovhcloud.com/en/public-cloud/object-storage/
[2] https://www.linode.com/products/object-storage/
You can use block storage for scalable disk size: https://www.linode.com/products/block-storage/
But then you're right back to the cloud billing problem, right?
That's right, auto-scaling comes with this problem, but at least you removed one extra service/point of failure.
That's assuming you trust Linode block storage as much as you trust S3.
Trust in what sense? Uptime, security, privacy? I am not sure whether I can say I trust one or another, by personally I had a good experience with VPS/dedicated server providers, more than with cloud (AWS/GCP).
Out of curiosity I tried to look up their pricing and the first thing I am greeted with when launching their price calculator is "you must allow functional cookies".
I disabled all shields for their side and still the same thing. Waste of time
I personally never used Linode and can not recommend nor talk against it, I was just pointing out that if you want scalable solutions AWS is not the only answer.
If your cache is much smaller than the data, it will be ineffective, unless you think everyone keeps downloading the same tiny subset of files. That last assumption works for web content (e.g. newest articles see more hits) but probably not for data.
So that you incur as much downtime risk as possible, obviously.
I hate these 'cloud economics' optimizations that people tend to try.
The risk that your service becomes faang popular and you suddenly need unlimited everything and need it immediately?
It is possible but highly unlikely. The more likely scenerio is you just continue overpay like a lot of others waiting for the moment. If that moment happens you realize with the sudden popularity your store inventory is sold out so you couldn't profit off of the extra traffic anyhow.
No, downtime risk as in now you have 3 separate systems and organizations that can have unexpected downtime and consequently so will your app.
There's a clear trade-off between downtime risk and cost explosion risk. For a hobby/non-profit project, risking the downtime to possibly save 7k€ plus surely saving the surcharge of "scalability" is definitely worth it.
The best setup will forever remain Heroku free instance tier with a free Pingdom account providing traffic to keep it from getting shutdown
Free heroku as a maximum number of hours a day. The ping hack isn’t working anymore.
Sorry, that comment want really serious and mostly a dolly example of bizarre cloud pricing hacking
Another option if Linode's included bandwidth + overages is too much is a dedicated box from Reliable Site. I'm not a customer nor am I affiliated with them at all, I just occasionally check in on their low end prices and noticed that they've started included an unmetered 1Gbps port with every host.
(search HN and reddit for that URL, you'll see they've been around and recommended for a really long time).
If you're going to have an intermediary proxy that you run, for AWS perhaps use Lightsail. It is price competitive, and includes more bandwidth than Linode/DigitalOcean/Vultr for the price.
You are not allowed to use Lightsail once you use more professional services on AWS atleast per ToS
Do you have a more detailed citation for that? At $DAYJOB we seem to be using Lightsail (for non-cache purposes) along with some "real AWS" resources without a problem,
AWS Service Terms[0]
51. Amazon Lightsail
51.3. You may not use Amazon Lightsail in a manner intended to avoid incurring data fees from other Services (e.g., proxying network traffic from Services to the public internet or other destinations or excessive data processing through load balancing or content delivery network (CDN) Services as described in the Documentation), and if you do, we may throttle or suspend your data services or suspend your account.
I think you’re expanding this clause past what it says.
The clause prohibits you from using Lightsail to cheat other services. So, per their example, you couldn’t set up a Lightsail instance as a reverse proxy between the internet and an ELB, to take advantage of Lightsail’s higher transfer quotas and “in-region” traffic from Lightsail to ELBs.
Hosting a site on Lightsail and hosting other things on other AWS services is fine.
I am not the OP. The terms indicate you cannot use it to proxy traffic to bypass bandwidth costs. The grandparent comment suggested using Lightsail to do this, and it is a violation of the TOS. The parent comment, however, stipulated that you are not allowed to use Lightsail at all, which is indeed wrong. I was just posting the relevant portion of the TOS which applies to the grandparent comment, and clarifies on the parent comment (which is inaccurate)
Thank you for the correction! Second time I made this claim but always forget it’s just about the traffic…
That's insane. But not a surprise, lightsail only exists so aws can say they offer similar pricing to Linode/DigitalOcean/Vultr/etc... as long as you don't ever plan to grow
Interesting. In this example where the parent comment discusses using a proxy from AWS to Linode/Hetzner to Cloudflare, then I'd go with someone in the Bandwidth Alliance, which would include Linode and Vultr.
Have either of those actually implemented Bandwidth Alliance? Last I looked(few months ago), the only outfit that had actually done anything on that was Backblaze. Vultr and Linode were nothing more than announcements with no actual cost savings for customers implemented.
Why not use the CDN of the cloud provider you are on? Azure Storage > Azure CDN
Reducing CloudFlare to a CDN is a disservice. They have some amazing services like Bot Management and Workers that make them very appealing. The CDN is just a nice bonus.
Because its order pf magnitude more expensive like anything on the cloud really..
Azure CDN offers almost no discount on egress over Azure storage directly. The same is the case with Amazon's equivalent services.
Or Troy Hunt can ping his Cloudflare contacts and see if he can get access to Cloudflare R2 Storage.
see https://blog.cloudflare.com/introducing-r2-object-storage/
From the Cloudflare blog, it seems R2 would've handled this exact situation - auto-migration of cloud S3-like-storage objects - download from cloud-storage just once and cache in R2 for Cloudflare to serve.
Has anyone gotten access to R2 yet? I signed up but haven't heard back myself.
Would love to find out if you can write to any/every region and have things replicate, or if you have to write to a single region. BunnyCDN's edge storage solution looked interesting until I found out it only supported writes to a single region.
Hoping R2 might be my savior here, otherwise will probably have to roll my own active-active minio cluster, which I'm not looking forward to maintaining. Other suggestions welcome!
How about Amazon Lightsail? It price structure is basically the same with Hetzner or Linode, and you get it in-house if you use AWS.
It is not compute cost it is b/w costs. That is pretty much same beyond free tier within AWS .
CloudFlare tiered cache is now free BTW
Ah the old cloud provider switcheroo. Yip this is the way they make money. They make it easy to setup some gigantic hugely scalable website then hit you with a gigantic scaled up bill. AWS would do this as well.
Team I'm in at the moment is in the early stages of cloud adoption but the company in total has fell hook line and sinker for AWS. When I mentioned the cost there is always an excuse.
The main one being that you don't have to hire sysadmins anymore as that's taken care now by AWS. Ah yes but they have actually been replaced with a "DevOps" team plus just our department now spend > £1 million per year to AWS in hosting costs. A 20% reduction in those fees could pay for a few sysadmin(s).
The next one is that no other vendor would be able to supply the kit. You know StackOverflow is able to run on a single webserver (https://nickcraver.com/blog/2016/02/17/stack-overflow-the-ar...). Plus many of the other providers have loads of instances available.
I mean I'm not against cloud it's just not the cheapest option if you choose one of the big 3 providers. I use a company called scaleway (https://www.scaleway.com/en/) they have all the essential cloud services you need and everything else you can run yourself in docker or k8s.
There's an argument to be made for quality of life for your employees. As someone who has transitioned from on-prem server management to mainly cloud work, my job happiness has skyrocketed. I haven't set foot in a data center in three years and I do not miss it one bit.
Dealing with hardware failures, hardware vendors, confusing licensing, having to know SKUs, racking new cabinets, swapping hard drives, patching servers - it's all awful work. When you go cloud only, you can be more productive instead of dealing with some of that nonsense work.
I always was a software developer first, but in the old days I spent enough time in the server rooms doing all sorts of sysadmin work, and those days I dabble in devops.
And, honestly, I miss the old days. Today, $cloud has some weird spasms where you suddenly get an influx of connection timeouts or tasks waiting for aeons to get scheduled and you just can't log in to a switch or a machine and figure out what the exact hell is going on. You just watch the evergreen $cloud status page, maybe file some tickets and pray someone bothers to investigate, or maybe live with those random hiccups "sorry $boss, everything is 100% good on our side, it's $cloud misbehaving today", adding more resilience -> complexity -> unreliability in the name of reliability to the system. Either way, with the clouds I feel handicapped, lacking the ability to diagnose things when they go wrong.
I don't miss those three days we spent fighting a kernel panic. Was about a decade ago - we outgrew the hardware and had to get a new one with a badass-at-the-time 10GB SFP+ NIC that worked nice for the first few weeks but then its driver suddenly decided to throw some tantrums on almost a hourly basis. I don't even remember the details - a lot of time flew since then, but thankfully we found some patch somewhere in the depths of LKML and the server was a perfect clockwork ever since. That wasn't fun, but that was an one-in-many years incident.
Either way, I do feel that in the ancient ages hardware and software used to be so much more simple and reliable. Like, today people start with those multi-node high-availability all-the-buzzwords Kubernetes-in-the-cloud monstrosities that still fail now and then (because there are so many moving parts shit's just bound to fail at incredible rate), and in the good old days people somehow managed to have a couple of servers in the rack - some proper, some just desktop towers sitting by - and with some duct tape and elbow grease those ran without incidents for years and years.
Have I turned old and sour? Or maybe it's just the nostalgia about the youth, and I've forgotten or diminished most the issues while warmly remembering all the good moments?
Cloud popped up mostly due to ease of use. Its a lot easier to hire cloudops engineer with somehow enough knowledge to deploy something on the cloud than someone who will be managing a datacenter and have it running.
The later ppl still do what they did, they just work for Cloud Providers making probably quite a bit more than they did previously.
IMHO its a win win situation for everybody. Less skilled engineers can be “productive” and former sysadmins have huge salaries.
In between your two extremes are colocation (no managing buildings, power, cooling, racks, security, optionally network), dedicated servers (no managing/installing servers, disks, warranties) and basic VMs.
We do colocation and we have to deal with HD and ram failures from time to time. Replacement of the hardware part is managed by the provider, but discovery and software requieres our involvment.
I just wonder what happens if a ram or hd failure hits a cloud provider node. Is the architecture on average really able to come over such failures without help and intervention.
This reads like a software engineer being happy work caters lunch so he/she didn’t have to cook for the whole team anymore. Didn’t anyone discuss maybe hiring a cook?
Yes but soon then you're running a kitchen and then a cafe and catering business, as well as a software startup. Which, given how many startups had in-office lunch/food pre-covid is maybe not a bad way to think of that.
Ah yes the Maserati problem https://www.quora.com/Whats-a-Maserati-Problem
By the time you're running a [catering business | massive sysadmin team], you're already a huge success. Congratulations!
I think this depends. For OPS people no longer having to physically go into a DC I agree but you've now pushed a bunch of work developers especially now will have a harder time as they used to make code and there was someone who sorted infrastructure now the devs themselves are kept up all nights with AWS stuff going up and down.
If cloud improved QOL for ALL employees I'd agree but I think it just shifts work around and costs more.
> Dealing with hardware failures ... it's all awful work
I've met plenty of datacenter technicians that loved they work and the opportunities for growth it provided.
Some companies really know how to manage a datacenter with minimum pain. Some don't.
It's not like all those jobs have been taken over by automation - someone still has to take care of these cloud servers ?
> Dealing with hardware failures, hardware vendors, confusing licensing, having to know SKUs, racking new cabinets, swapping hard drives, patching servers - it's all awful work.
Each to their own, but I think you'll find there's a fairly significant portion of sysadmins who love that work!
I can see both sides. If you're a startup that needs to be able to scale quickly if product market fit is achieved, the cloud really saves your bacon. Or is your ten person team really going to figure out how to get Postgres to reliably run with billions of records, with encrypted backups, etc?
It's basically a form of permanent debt. Faster product market fit, higher long term infrastructure costs until you have enough breathing room to start pulling it into your own datacenter. At that point you have some negotiating leverage with the cloud provider.
On the other hand, if you're not looking for explosive growth man oh man is DigitalOcean or anyone of a number good providers of good old VPSes / Cloud-lite.
I keep hearing this argument against using your own infrastructure again and again, and I'm not sure how true it is.
I've worked with teams on both sides, and everyone is gonna have to deal with figuring out how to run at scale, it's just different ways of achieving that.
I've worked with teams that manage their own infrastructure with dedicated servers, and not having to think about scaling for a long time as the one beefy server could just take whatever load you threw at it.
I've also worked with teams who don't manage their own infrastructure and thought they were ready to scale without issues, but once the scale actually happened, it turned out there was more things to consider than just the amount of servers you run, race-conditions were everywhere but no one thought about that.
Definitely a case of "right tool for the right job", but I don't think it's as easy as "Self-managed: harder to scale, PaaS/Cloud: easy peazy to scale".
Yeah agreed I haven't worked with Google scale companies but I've always found scaling issue to to development related not infrastructure related. So examples would be a bad db query that takes the system down, overly chatting webserver that issues too many queries to the backend, pulling large datasets into the webapp causing exhaustion of memory ...etc. AWS / Azure can't be these issues they have to be fixed in your code.
There is definitely a place for AWS/Azure and their offering of services is fantastic but they are not a silver bullet for scaling your website to millions of active user.
On another point though the vast majority of websites you'll ever build won't have that level of active users. It's a good problem to have though as it means your site is doing really well.
> I've always found scaling issue to to development related not infrastructure related. So examples would be a bad db query that takes the system down, overly chatting webserver that issues too many queries to the backend
This is actually one of the strengths of the cloud, startups that can't afford talent throw compute resources at the problem. Running your own servers isn't hard per se, but it requires a certain breadth of less centrally documented knowledge than the cloud and a willingness to fuss. Developers like that can often command higher prices than most startups pay these days :)
Having someone with good cloud chops is still a difficult ask.
Putting it all on the devs is exactly how you end up in the haveibeenpwned database and on the cover of magazines (for the wrong reasons).
We’ve traded sysadmins for more expensive DevOps. I would love to see a study on if we actually hire less people than if we just did it the old school ways.
I don't disagree; but I think the cloud (AWS/Azure/GCP) have sort of shielded people from how cheap/powerful the underlying hardware has became.
For ~100eur/month on hertzner you can get a 16core Zen3, 128GB RAM with 8TB of NVMe SSD.
Unless your stack is horrendously badly optimised you can serve SO MUCH traffic off that - definitely billions of postgres records without breaking a sweat.
So the scale argument somewhat disappears - if anything, people end up adding much more complexity to the product to get round the high hardware costs of the cloud (complex caching systems for example, instead of just throwing loads of hardware at the problem).
> I don't disagree; but I think the cloud (AWS/Azure/GCP) have sort of shielded people from how cheap/powerful the underlying hardware has became.
I guess I shouldn't be surprised, but I do find myself often surprised to realize that for a younger generation of developers they have never experienced hosting on bare metal. So they have not been exposed to costs & benefits vs. the cloud approach and feel that no local machine could ever be as fast as AWS. Even though in reality even a pedestrian server is immensely faster and cheaper than any AWS offering.
Now, sure, there are tradeoffs in ease of scaling up and other considerations, but it's good to keep and eye on the actualy tradeoffs you're making and how much it's costing.
As a software developer, I think the best thing about the cloud is knowing that if you need the capacity, and it makes sense cost-wise, you'll get it. In-house servers might be cheap, but in my experience it could be incredibly hard to get that money spent when it's needed, and I've seen companies throw expensive software engineering time at optimizing software when it would have been much cheaper to solve the problem with hardware.
Not only can you end up spending $10k of engineering time to optimize and test a random, non-core-competency bit of code instead of an extra $1k/year on hardware, you also have to maintain the optimized code instead of the simpler code.
Maybe I just worked at companies that did a poor job of managing servers, or had a dysfunctional relationship between software engineering and operations, but at least that's no longer something I have to worry about in a cloud environment. If spending a little extra on hardware is the best solution to the problem, process/planning/politics won't get in the way.
> in my experience it could be incredibly hard to get that money spent when it's needed, and I've seen companies throw expensive software engineering time at optimizing software when it would have been much cheaper to solve the problem with hardware.
That's true with owning your hardware, but what about renting from Hetzner/OVH/etc? You get servers set up in minutes unless you have a very specific request (the only time I've had lead time with these providers is when I had a very custom request, a machine with 300+ TB of storage - yes that is not a typo). Everything else has been delivered pretty much instantly.
But even if let's say you have a very specific use-case such as needing a 300TB server that would typically require lead-time, well in that case the prices are so cheap that you can just keep it around all the time sitting mostly unused and still come out ahead compared to cloud pricing.
> As a software developer, I think the best thing about the cloud is knowing that if you need the capacity, and it makes sense cost-wise, you'll get it.
Yes, that's the beauty of it and sometimes you need it.
OTOH, how often do you need to grow capacity without any lead time like that? If you are in a hyper-growth stage in a startup you absolutely need it and it is a lifesaver.
But, most companies never see a hyper-growth stage. Even those which do, it's a relatively short timeframe (you can't grow exponentially very long).
All the rest of the time it's a fairly large premium to pay just in case another hyper-growth period happens. Sometimes it's totally worth it. But good to review the likelyhood and cost tradeoffs every now and then.
To give you an example - we run quite a lot of workloads on Azure app service, which isn't the same as bare metal, but does allow serious scaling if required.
We run most workloads on a 3.5GB/2 "vCPU" box. This costs around $70/month per instance. We actually haven't scaled this out past 8 instances, at a cost of $560/month (and that has been extremely rare).
On bare metal we could have ran it on a $100/month 16core/128GB box and always had that capacity in reserve. While app service gives a lot of benefits, the scalability argument is somewhat moot as basically you can provision all the capacity you would scale to 24/7 and still the same/less than cloud.
Maybe it's just the projects I've worked on, but I haven't really ever seen people require 100x or 1000x the capacity in a very short period of time (which obviously bare metal could not do). I've seen traffic grow that much - but generally over weeks, months or years.
> the best thing about the cloud is knowing that if you need the capacity, and it makes sense cost-wise, you'll get it
It stopped to be the case in pandemic times at most cloud operators due to general hardware and capacity shortage.
AWS seems to have some pretty decent Xeons (hard to tell because Intel makes special SKUs for Amazon, I think). I guess it depends on what you consider 'a pedestrian server' -- 128 threads/512GB of memory isn't cheap, although maybe in the enterprise universe maybe it is, I'm more of an academic. So, it is nicer than the 10 year old cluster I tinker around on, not as nice as the system I send real runs to...
> For ~100eur/month on hertzner you can get a 16core Zen3, 128GB RAM with 8TB of NVMe SSD.
What option is that? The closest I see is the CCX41, but that is 40% more expensive, 140 Eur/month, half the RAM (64 GB) and ~4% of the disk space (360 GB)
All I can see is maybe the AX101? It matches all the specs they put down, although the SSD is RAID 1 @ 4TB total.
Yes, 8TB total but in RAID. Also keep in mind Hertnzer quotes prices VAT inclusive, whereas most clouds add VAT on top. For US customers you can take ~20% off those prices.
> Or is your ten person team really going to figure out how to get Postgres to reliably run with billions of records, with encrypted backups, etc?
Actually AWS won't help you here. I have literally been on a 2 day training course or aurora with AWS and the explanation of how to scale was actually just the same as any traditional non-cloud explanation. Correct usage of indexes, partitioning data, optimising queries (especially any non trivial query output by an ORM) and read replicas.
In terms of explosive growth if you're talking about something like google or tiktok again slapping it all in AWS will not automatically just work. There is a lot of engineering that you'll need to get to their level.
I also think you haven't really looked at the SO link I sent through with thoughtful engineering they have huge user base with a tiny footprint.
> DigitalOcean or anyone of a number good providers of good old VPSes / Cloud-lite
Not sure why you are dunking on DO here they are a fully fledged cloud provider with much the same stuff you would need. You can also run up a huge bill on DO as well.
There are two parts to this. You are correct that RDS doesn't help you with picking the index strategy, or optimizing queries. I don't see that as running the DB though, that is how you interact with it once its running. What it does do it help you reliably run the DB server itself.
Without any effort you can stand up a redundant, high availability deployment. With all of the data encrypted at rest. And configure nightly backups, which are stored on redundant storage in multiple physical locations and also encrypted. You can then restore those backups into a working system with the click of a button. Oh, and minor version patches happen automatically with no downtime. And you can click a button to do major version updates.
The last time I did analysis on it, which was a while ago, all of those features cost us less than 8 hours of my time each year. It would probably take more than 8 hours of my time each year just to handle security patches on the systems. Let alone the amount of engineering that it would take to get a system as redundant and reliable as a DB in RDS. I will happily pay them to take all of that off my plate so I can focus on other things, like optimizing the queries.
> Without any effort you can stand up a redundant, high availability deployment.
Yes, it is seductive. Sometimes worth it.
But realize you'll be paying monthly in perpetuity for the convenience of that one-time setup which could've been done a a few days, give or take.
> all of those features cost us less than 8 hours of my time each year
I'm surprised! Our RDS costs are about 10 engineering hours per month (120 eng/hrs per year). This is with hardly any customer traffic or data yet (early startup phase).
It's worth it for now, but it'll become unreasonably expensive later.
I should clarify that the 8 hours was above and beyond the costs of running it yourself on AWS. So that is not counting the 2x ec2 instances, plus the minor s3 and elb costs. Didn't really run the numbers for equivalent hardware elsewhere, since that wasn't an option for us. Eyeballing it real quick right now, its still maybe an hour / month vs other places for the hardware. It is a relatively small instance though, saving probably are much better as it gets to larger sizes. Pre-paying for reserved instances helps here as well.
> I can see both sides. If you're a startup that needs to be able to scale quickly if product market fit is achieved, the cloud really saves your bacon.
Depends on the team size of the said startup [0]. In my opinion, tech-shops are better off using new-age cloud providers like fly.io / glitch.com / render.com / railway.app / replit.com / deno.com / workers.dev etc [1].
[0] https://tailscale.com/blog/modules-monoliths-and-microservic...
> is your ten person team really going to figure out how to get Postgres to reliably run with billions of records, with encrypted backups, etc?
Most of the problems here will be DBA problems like understanding query plans and such. Even with AWS RDB, I’ve had to upload various setting files to tweak tunables to get things working.
That stackoverflow infra blog post is out of date. They use more than a single webserver now. For example: https://stackexchange.com/performance
Now they have 9.
They still serve a lot more traffic than I do and I have hundreds of instances; thousands of containers.
You have thousands of containers? Physician, heal thyself.
I mean, at my last job I had thousands of physical machines too.
Scale can depend on many things.
Here's a couple of reasons why it can easily be thousands:
1) Cronjobs, CI jobs, ETL, FaaS are all systems that exist. What used to be a process is now a container. (one need only check the PID count on their local machine to know that this can be many quite easily).
2) Microservices; I'm a larger fan of fat "services" but doing actual micro services tends to leave you with a lot of containers running
3) Actual compute need. If my original hosting strategy was thousands of machines, well, I'm going to have thousands of containers, if not more.
Sure, but the implied message of your comment that you were saying you could replace all of your instances and containers with just 9 machines, since StackOverflow "serves a lot more traffic than you do" (i.e. "has more actual compute need"). I think most reasonable engineers would say that "thousands" of containers would be a massive mistake to use for that size of task, even if few of them would go to the extent that Stack Overflow did of using only 9 machines.
Thousands is not a lot. If you do microservices and have 100 of them, 3 replicas of each for dev, qa and prod, you already are at 900.
Most importantly, SO is extremely read-heavy, write-lite, and cache-friendly.
A similar “scale” e-commerce site would be significantly more load, have more dynamic data, and just be overall harder to run.
Looks like they have actually reduces their footprint. It not that they do run on a single webserver it's that they can run on one.
> Looks like they have actually reduces their footprint.
i don't remember who said it but a quote i really like is "it's not finished when there's nothing left to add, it's finished when there's nothing left to take away"
It's commonly attributed to Antoine de Saint-Exupéry and is a lot older than I thought, from 1935 and originally in French.
See also Let's Encrypt: https://letsencrypt.org/2021/01/21/next-gen-database-servers...
our department now spend > £1 million per year to AWS in hosting costs. A 20% reduction in those fees could pay for a few sysadmin(s).
You can hire a "few" sysadmins for 200k/year?
In the UK/Europe yes:
https://uk.indeed.com/jobs?q=System%20Administrator&vjk=5149...
Probably not at FAANG level salaries but I doubt there are many sysadmins working for FAANG companies anymore.
DevOps btw are more expensive and infact in the UK DevOps can be higher paid that a developer. I suspect most of the DevOps working for this company are on £65k+. According to:
https://ifs.org.uk/tools_and_resources/where_do_you_fit_in
That puts those earners in the top 3% or from that website:
" In the below graph, the alternatively shaded sections represent the different decile groups. As you can see, you are in the 10th decile group.
In conclusion, Your income is so high that you lie beyond the far right hand side of the chart. "
£200k / year, in the UK? That's about 2-5 depending on experience.
A 20% reduction would result in ~£800k/yr.
They're saying that if the AWS costs decreased by 20%, they could use the now freed-up money, ie 200k, to pay sysadmins.
> > £1 million per year
I'm curious about your workload. I tend to only use cloud for workloads where it's either (1) by far the only feasible option (e.g. need GPUs for short periods of time), or else (2) basically free.
> I mean I'm not against cloud it's just not the cheapest option
This is certainly true for most workloads. It's also true that buying is better than renting, but here I am living in a rented apartment.
The logic from on high might be something like "if demand is uncertain and capex is risky, why buy when you can rent?"
Ouch. If Troy Hunt of all people can make this mistake, it can happen to anybody. HIBP is an awesome service funded totally by donations, so it's too bad this happened. Of course Microsoft is happy to hide behind their confusing pricing model and let customers overpay for Azure without alerting them.
> If Troy Hunt of all people can make this mistake, it can happen to anybody.
Exactly this. As a low-level / embedded / non-cloud stuff dev, I've been getting up to speed through all the cloud-ification of the industry, but I'm still scared (not literally ofc) of running most things on my own on any big cloud provider (smaller ones seem more manageable).
I'm reading this and seems like being a customer of cloud services is like walking a dangeous path filled with gotchas and caveats, just jumping from cover to cover while hiding from danger, and hoping you're safe and didn't mess it up so far, "fingers crossed".
Like this tiny detail that he didn't realize was critical, so I would fall on it too plus on another 500s small papercuts: "oh I set cache up, so I hope all is well". "Yeah, no you aren't, I guess you didn't think of this detail about maximum cached file size! Gotcha, Game Over!"
Yeah cloud providers should have clearer communitacions and etc etc... but the fact of today is that they don't. So I'd never sleep well feelin 100% confident that I had covered and taken into account every minuscule detail and possible scenario that could end up being a disaster.
> I reached out to a friend at Cloudflare and shortly thereafter, the penny dropped
Another advantage is his big network that he can ask for help. There's also a chance that his blog post will reach the right person in Azure and he'll get a reduced bill.
As someone who doesn't have the same network or the "fame", I am concerned about what would have happened to me in that situation.
Remember when no-sql came out and everyone was rushing to it because "rdbms don't scale"? I'm beginning to feel the same way towards "cloud" in the Azure or AWS sense. You can go really really far with standard issue VMs from linode or digital ocean and so on. I wonder how many are overpaying for Cloud services so far above and beyond what their actual needs are.
How are you teaching yourself?
Due to my work, I'm involved in some WebRTC stuff and have been getting in touch with deployments on AWS, CloudFormation, that kind of stuff.
Every cloud makes this mistake easy! You have to manually activate billing alerts for everyone because they want you to spend more snd more each month.
I am still waiting for a cloud without these dark patterns. But that will never happen because it‘s leaving a big amount of money on the table by not being hostile.
Also the billing alerts is just that an alert. They should have something in place to put a hard cap on monthly spend. That way his free website would go offline when he's spent > $X.
As you say they make it hard deliberately.
Edit: Turn out Azure have this:
https://docs.microsoft.com/en-us/azure/cost-management-billi...
I see there is a spending limit for the "intro" or "preview" plans designed for students, Visual Studio users, and resellers (the "hook" part in "hook, line, and sinker").
Not for actual cloud usage, like an actual pay-as-you-go plan where this would be useful.
https://azure.microsoft.com/en-us/support/legal/offer-detail...
Yeah, there are trial accounts and things but as far as I know, none of the big cloud providers have a way to say "Under no circumstances are you to charge me for than $X per month even if it means shutting down services."
A lot of Chinese cloud providers like Aliyun will only allow pay and go, because there’s no way to recursively bill
My own instances go down all the time when I forget to ‘top up’ the account
Yeah I see that too I'll add that to my comment tbh I think what I said still stand it's been made hard or maybe said a different way is:
It's very easy to overspend on these big cloud providers
Oh I can't edit the comment now oh well sorry if I've confused anyone.
And the alerts are not instant, at least with Azure - they run reports every 24H or so, and execute alerts every 24H or so. So even if you're careful, you can still be on the hook for a couple of days' worth of spend - which could be very expensive.
Eh after skimming it I feel like there's still gotchas with it. Not every account can turn it on and looks like there aren't custom limit.
CMIIW, it'll be my first cloud provider if I can set one.
I dont understand why not all online metered providers are forced by law to do this.
Pretty sure fixing deeply technical business to business transparent-but-potentially-terrible pricing models are pretty far down the priority list on things that will get them re-elected right now (not even counting campaign donations).
Contract dispute cases might clarify it, but probably not in the direction any of us is hoping.
Dark patterns - this sounds like a colour scheme you don't care for.
"Predatory death-trap pricing" captures the spirit of the thing with rather more clarity. It is wholly intentional after all.
> Dark patterns - this sounds like a colour scheme you don't care for.
I can see your point- if I'd never seen the term before I might have a similar reaction. But it's quite a common term now I think.
Not it is not. It is common industry jargon for people who work with websites. Nobody else knows what it is. If you had to pick a term to sound as inoffensive as possible for fraudulent, deceptive conduct while still sort of capturing the thing without being abjectly false you could do no better than something that sounds like a colourscheme.
"Mr Politician what do you think about $big_co using dark patterns?"
"Mr Politician what do you think about $big_co engaging in predatory death-trap pricing schemes to defraud consumers?"
What is being asked in one of those questions is clear to everybody. The other is jargon and excludes the majority of the population from understanding the intended meaning. And note, you don't have to agree that it is "predatory death-trap pricing" at all. That is simply in the above sentence an accusation. Words have meanings. That the accusation be understood clearly by as many people as possible is important.
I appreciate your point that as many people need to know the meaning accusations and that meaning is important, and I agree.
However regarding your other claims- I dont work on websites, I know the term "dark pattern". so your assertion that "nobody else knows what it is" is false. You might then argue that I'm still in tech, but that's just another goalpost that I can reach anyway: if you google for the term you'll find vox articles, explainer sites, even newyorktimes articles using this term.
So yes, it's a common place term and your assertion to the contrary does not hold any water.
> Dark patterns - this sounds like a colour scheme you don't care for.
Or craft clothing for goths?
But the "dark" comes from its association with evil: "Defense against the Dark Arts", "The Dark Lord", "Turn to the Dark Side of the Force". It's a clear implication that the people are "selling their souls to the devil": knowingly doing something "a little bit evil" to achieve their aims.
We had a similar situation to Troy's where several thousand pounds was charged in a matter of days as a result of our misconfiguration of caching in our azure app services (before that month we typically had around £800 a month costs). We emailed Azure / Microsoft and they were happy to refund us. I don't think this is their intended business model.
Get a VPS from linode for $5 a month and it costs $5 a month.
They have transfer limits and an associated overage fee iirc. I can still see this sort of thing happening if that is the case.
My understanding is you hit your bandwidth cap and that's it, no more bandwidth.
(edit) looks like that's not the case, I'm sure I used to have to buy a second instance a few years ago if I did want to use more bandwidth that was allocated
I barely even realized that, since my hobby stuff doesn't come anywhere near the limit.
For those curious, the overage rate is $10/TB ($0.01/GB) after the transfer included in the plan.
The smallest amount of included transfer is 1TB for the $5/mo VPS.
Or just go with Hetzner and have a limit 20X as big with cheaper prices
Personally I like Mythic Beasts and use their raspberry pi servers and VPSes. Much less terrifying pricing and the support is good too.
Yes, their support is amazing. I can email the support address and have a real human who knows what he/she is doing reply within minutes or hours at most.
Exactly! It is quite refreshing to not have to battle past first line to find someone with the understanding to help. No shibboleth required. Cheaper hosting, too.
Not really as there is network traffic quota:
If you use up your monthly network transfer pool, you can continue to use your Linodes normally. That being said, you will be charged $0.01 for each additional GB at the end of your billing cycle.
Sure that's what i am doing. A beefy dedicated machine with no bandwidth pricing. But that also means i need to do everything by my own. I don't get any of the worry-free services of AWS.
Like the $10k/day bills? Odd definition of 'worry free'
I worry far less about my Netcup servers + BunnyCDN than I ever did about my AWS bills.
How is surprise billing not a worry?
Or hetzner, or Vultr.
If I leave my water running then go on vacation I'll have a huge water bill too. I don't conclude my water company is intentionally trying to overcharge me. The more reasonable conclusion is: building an alert system that addresses every customer need is hard. Most enterprises (where all the customer focus is) want minimal downtime above other considerations, including cost.
This is like you leaving your tap running slightly as you go on winter vacation so the pipes don’t freeze over.
But the water company does not actually allow you to install proper taps to regulate the water so you use duct tape to do so, and due to an earthquake something falls on the tap causing your duct tape solution to fail leading to a massive surge of water, leading to your massive water bill.
Did the water company cause this? No. Your duct tape solution wasn’t resilient enough because it didn’t factor in an earthquake. But I would be justifiably mad that my water company does not allow me to install actual taps, and allows unforeseen and unpredictable situations to make me run up huge bills that could otherwise have been avoided with a proper tap.
The problem with the cloud is that the entire Internet is in control of your tap and can choose to open it and waste infinite amounts of water with no way to implement a cap.
>Every cloud makes this mistake easy!
Funny enough...Oracle (OCI) makes it better, you can buy oracle"coins" 1to1 with $ and load your account just with what you think you need.
If Oracle cloud is still shenanigan free in 2 decades, I’ll consider it. Until then, Oracle gets $0 of any budget I’m in charge of.
See you when your a even more inflexible (and old) guy who makes the bet on one horse, look i give a *hit about (or anyone else) oracle, what i care for is migration without problems from one provider to another.
Hard requirement: My image can run on it (freebsd and linux), no proprietary BS, no special stuff, give me vm-"harware" make it fast, make it cheap, make it reliable, that's it..that's it.
And ATM i like oracle hetzner and vultr at most. If one of those change to my disgust i change, no big deal...just some dns rewrite.
The Oracle cloud UI is extremely slow and terrible to work with. I'd prefer shenanigans every so often than having to deal with that crap on a daily basis.
Kind of funny, for me it's faster then Amazon.
Hetzner's cloud offer is limited but they limit your possible spending by default and it's very easy to set up billing alerts. I guess they mostly do it to ensure they get the money at the end of the month, but it's equally useful for their users.
Additionally Hetzner's egress pricing is a lot more cheaper. On hetzenr you pay 1,19€/TB (1.13$/TB) vs. 90$/TB on AWS. That's about ~80 time more on AWS!
> I am still waiting for a cloud without these dark patterns.
This is how mobile and landline phone companies made enormous fortunes before flat rate billing. It’s called post-paid vs pre-paid billing.
Do you have any substance to your allegation of Microsoft hiding behind their pricing model?
This is very straight forward from their view, before: almost no traffic = almost no costs, now: huge traffic = $$$.
On the other hand, it doesn't seem that Troy did try to talk to them about this and seems to want to eat the costs himself. As it was his mistake. I think that's commendable. I also think with the amount of free advertisement Troy has done for them they'd be open to this and I can imagine we might see a followup post like "MS was so nice they waived my costs".
He's an Azure MVP. He already has 13 k in credits/yr, which could absorb the costs ( just guessing here)
He's also independently very wealthy. Dude drives a GT-R and AMG C-Class (as at 2017, probably upgraded by now). He got a generous payout when he was laid off from Pfizer.
Putting anything internet-facing on the cloud is as irresponsible as posting your credit card number publicly. Anyone can essentially charge you an infinite bill and you can't do anything about it until it's too late.
Maybe it's not a problem when you're dealing with millions of VC money, but there's no way in hell I would host anything in a bandwidth-metered cloud service when my or my own company's money is involved.
Correct me if I'm wrong, but Troy Hunt is a person focusing on security, not infrastructure, deployments or development even. If anyone is near making that mistake, it's people like Troy Hunt. Operators would of course see the problem easily (paying for bandwidth like that would be the first warning sign), while they are sometimes blind to other issues, like security.
> Correct me if I'm wrong, but Troy Hunt is a person focusing on security, not infrastructure, deployments or development even.
Eh, I don't know - either way he is a Microsoft Regional Director and MVP for Cloud (as well as security), runs courses on cloud deployment on Pluralsight, and has done speeches on Azure and reducing cloud bills, so if a he can get stung it doesn't say a whole lot good about my chances.
Thanks for the correction :) I had no idea he did all of those things in addition to the security work. Certainly doesn't paint Azure or him in a good light then.
So I guess one method would be to set spending-limits when you setup your account. But that'd lead to constant moments of having to bump your budget (or worse, get approval to do so from Accounting) when you're trying to work.
There are both spending limits and alerting that you could use, but would be impossible to predetermine from Azure's perspective, so they rightly ask you to.
There's an entire surprisingly-large industry built around providing better UI to the major cloud providers, so you can actually tell WTF is going on with billing, access control, networking, et c. They're so hostile that it has to be intentional.
The underlying issue is that the cloud console is owned by a single product team, and THEY decide what gets exposed - not the underlying product teams for the individual services. At least that's the case for AWS.
The result is that you get a lowest common denominator type of dashboard. And hence a whole industry of providing just a prettier dashboard on top of AWS / GCP / Azure metrics.
Datadog started with a prettier dashboard for Cloudwatch data.
Cloudability started with a prettier dashboard for the Cost and Usage Report.
And also works the other way around. The individual product teams buy development environments to circumvent the console restrictions.
For example, a few years ago, the Redshift team purchased "DataRow".
Who is Hunt Troy?
The guy behind 'Have I been pwned', a website where you can check if your login credentials to some website have been leaked.
It is worth mentioning that the alert itself costs money. So if you're evaluating the alert every 5 minutes on the past 24h of data it can burn a small but surprising amount of money.
From TFA it looks like that would be 10 cents per "time series". Or what I translate it to, is 10 cents every 5 minutes (*I think, but I havent used Azure in some time*). $1.20/hour, $28.80/day, almost $900/month. Not too hard to drop that by making the alert less frequent. (edit: I think I saw AU$ there, so maybe it is AU$900.)
A time-series represents a "thing you're monitoring" – in this instance, it's aggregate egress, so $0.10 per month, regardless of the evaluation period.
Monitoring CPU? Another $0.10 per month. Memory? Another $0.10.
Thankfully, not $900.
I meant to emphasize frequency, not eval period. Apologies. That said I took a look at the pricing docs and didnt see frequency mentioned, so hopefully I am in the wrong about the price.
As an aside, their (Azure's) pricing docs are written in the same fishy way their technical docs are written (my opinion only)...
The service being used here is Azure Monitor, which charges $0.10/month for each metric you monitor with alert rules: https://azure.microsoft.com/en-us/pricing/details/monitor/
They do charge by the MB for ingesting custom metric data, but storage outbound bandwidth is one of the free "standard metrics" built into the platform. So the alert's full cost is $0.10 per month. Since they charge by the "time series," I think that means the OP could configure different alerting thresholds on storage outbound bandwidth, and they'd still only be on the hook for $0.10/month.
Shameless plug: https://CloudAlarm.in (in beta), sends you real alerts usually faster than azure with multiple reminders. It does this daily unless you tell it to shut up for the month for the given exceed. I call it real alerts because it doesn't wait for consumption threshold to reach the way Azure cost alerts do; as soon as it detects that your current cost * remaining days > the budget amount, it'll send you an alert [1].
The alert emails are way more meaningful (with projected amount in subject for example) unlike generic ones from Azure Alerts, so you see a real alert and prompted to take immediate action.
1: https://cloudalarm.in/Home/Docs/#how-is-budget-alarm-differe...
But surely CloudAlarm relies on the same data as Azure's alerts do? Azure support told me that data is only updated daily.
Also, Azure has an option to alert you beforehand if it looks like you'll go over; struggling to see how your service is any better.
AFAIK, Azure only alerts when a threshold you specify is reached. For instance, you set a budget of $10k and you specify 50% threshold. So when you have consumed $5k, Azure will send you an alert. However, suppose your daily run due to some bad selection is $400, in this case, Azure will tell you only on the 14th day that you've consumed 50%. On the other hand, CloudAlarm doesn't need threshold – it just takes your budget and see whether your daily burn rate estimates to exceed your budget. In the above example, CloudAlarm can thus notify you on the 2nd day itself because $400 * 30 is $12k.
Yes, CloudAlarm, as of now, depends on Azure's pricing data API to do this calculation. It's easy for Azure to do what CA is doing but their threshold based design is a problem, which only prompted me to create this service.
CA also has 'new resource' alarms as well, which are almost instant (a few mins after creation of a new resource), which helps you monitor and fix resource created with unexpected, expensive tiers. This can often happen with automated creation of databases, for example.
I just did it because Azure wasn't doing it, despite people complaining, including me, had faced multiple such issues of unexpected expensive resource got created without intention/knowledge.
This is something to be mindful of when using datadog synthetics monitors as well - if you have a short interval, or many locations being tested from they can become expensive quickly
These stories almost always boil down to this fundamental conflict of what you want for a personal project vs a business. (though in this case yes, Troy Hunt's HIBP is larger than a lot of startup businesses)
In a business setting, you want your service to stay up, at the cost of spike in costs if accidents or mistakes happen.
In a personal project, you want there to be hard limit on cost, and your service to go down if spikes call for it. (I'm relatively sure that no one wants their personal projects to incur a bill of thousands of dollars by accident.)
> In a business setting, you want your service to stay up, at the cost of spike in costs if accidents or mistakes happen.
No you don’t. This is absolutely not a given. Being a “business” doesn’t mean you suddenly have unlimited budget.
The vast majority of businesses are not “web scale” and are better off taking an availability outage than suddenly handling 1,000,000x the normal volume of traffic.
I'd say that it definitely depends on the business.
If you are selling you product via your web site and you're suddenly on TV with millions watching and accessing your site, you definitely don't want the server to go downand autoscaling + a bit higher cost would be great.
Only if you can actually fulfil those orders. If your production can't be increased, and the business is small, the cost could far outweigh the potential profit.
Not if you’re selling a real product. The only world where people can scale from 1000 orders a day to a million orders a day overnight is one where you’re not selling anything physical.
I don't agree with this - even for businesses there is always a limit over which there is serious trouble for bottom line. I think cloud providers should allow one to set a hard cost limit over which everything shuts down. For personal projects the limit might be $100 and for small businesses $100k, but even rich companies have it (not the same reason, but Knight Capital comes to mind).
Certainly the cloud providers probably make money by not having hard limits.
But it's also the case that if they did implement hard limits of some sort, you'd be reading blog posts about how AWS destroyed my project just when it was going big because someone stuck a circuit breaker foot gun in some corner and everything stopped working properly when usage spiked.
I do think there should probably be a hard circuit breaker. It should be simple and therefore inflexible. And it should come with a big warning sign. Still people will get burned because someone will set it, a project grows, and one day it goes off.
But... they do implement limits.
As far as I know, they implement billing alerts but, aside from some student and some other limited account types, they're alerts. You'll get an email that you've hit your limit but your bill will continue to go up until you shut things down.
And do note that these alerts are not instant. With Azure, if their backend reports and alerts are timed wrong, you're still on the hook for 2-3 days' worth of costs.
They have limits on things like how many EC2 instances you can have but not on things like bandwidth.
While you can raise those limits by request I'm also not sure whether you can actually reduce them again later.
Aws doesn't, there are only alerts. Don't know about others.
Azure (and I'm sure other cloud providers do) allow you to set email notifications for when your bill goes over a set amount so you can stop it before it happens.
If you're using a cloud provider I'd highly recommend setting one of those up.
In Azure it's under your Subscription and then Budgets
AWS can send you alerts when it looks like you will go over your budget for the period, before you're way over your projections, which is a nice feature.
I truly believe they want you to use a lot of their resources on a consistent, long-term basis; they don't get long-term value from people having short, one-off anomalies, so budgets and monitoring are aligned with their customers - just not total cost of ownership calculations :)
Is it really that black and white? I think there is a continuum in hosting service. Not just A) very low end VPS, and B) unlimited cloud.
The fact is that there are low end VPS, middle end VPS, high end VPS, and dedicated servers. If you started from a low end VPS, it is very easy to gradually upgrade your VPS.
A $5/month VPS can be used to play for tons of things. I just don't get people who use free tier cloud, unless you just want to learn about the cloud hosting per se.
The problem is that the cloud alone will not allow you to scale infinitely. Slapping a standard DB-based application on the cloud will not magically make it scalable. RDS will still be a bottleneck and can't be scaled online for example (Aurora might change that, but that's a very recent development).
Making your application scalable is a significant effort that may involve different trade-offs. Your typical Prestashop or Magento e-commerce site will still max out the DB and go down, cloud or not, but with the cloud you'll end up with a huge bill in addition to your downtime.
Engineering your application to be scalable is an option that's often not made for cost/time to market reasons which is fine, but in this case the cloud will give you much less scalability than a lot of people believe.
Isn’t that why services such as AWS lightsail and digital ocean exist?
Have you contacted Azure? On one hand you owe the money “fair and square”, but on the other if I were them I’d waive an unexpected $10k bill to a good faith actor that was incurred without any proactive notification by Azure.
Yep, we had an incident with Mongo cloud where a bug in their synchronization protocols for Mongo Realm resulted in an insane amount of traffic. This was a development cluster with almost no application load somehow pumping around many TB over the course of a few days. The bill was many thousands of dollars. Their support did the right thing. And we actually ended up with some credits because we were having a rough time with bugs in their software. Ultimately, we gave up on Mongo Realm because it was just not working as advertised for us (high CPU usage on the device, lots of bandwidth, we experienced data loss in the managed cloud storage, etc.). But their support team was great.
Their interests is keeping you as a long term customer. So, they will help you if they can. Unexpectedly high bills like that can end the relation in no time. And 10K is not a lot on a yearly basis. That's a few months of normal usage for lots of companies. So, protecting that revenue is worth something to them. That's also worth realizing when you deal with cloud providers: you are spending non trivial amounts of money on their services and support is part of that deal.
A developer on a team I worked with many years ago accidentally committed our AWS keys in a repo. Got a $30k bill due to a an enormous amount of EC2-instances being spawned. We contacted AWS and they were very understanding and reduced the bill to $50.
I had this happen to me once on Digital Ocean, and I contacted them - they were rather understanding that the bill I had was clearly "atypical for my account and not intended" and refunded it.
I'll second that.
I've seen several cases on both Azure and AWS that bills got weaved after someone opened support ticket starting with "oops, I just did..."
I got an $800 aws expense (one line item) waived after I contacted them and they asked me to explain why it happened and how I’ll prevent it from happening in the future. I think it’s a once per account thing they’d probably do and Troy should definitely do it.
> Secondly, there's cost alerts. I really should have had this in place much earlier as it helps guard against any resource in Azure suddenly driving up the cost.
He did not enable alerts.
Every online course that requires you to use a public cloud to deploy something should first have you set up a billing alert that notifies you when costs start to creep past something reasonable, like $20 or $50 (depending on the course and work involved).
OP do this! It works, they are usually very generous (same for gcloud!)
Op is Troy hunt, an ms MVP. You can bet there are people from MS doing it for him as soon as they got wind
100% do this. Azure has a surprisingly responsive billing support team and will likely eat this as goodwill (honestly with this on the front page of HN they'll probably do it proactively). Just open a ticket in the portal.
Also there is 0% chance serving this traffic cost Microsoft anything near $10000.
As opposed to all those other customers who are not good faith actors?
You’re trying to be snarky to GP, why exactly? Yes there are bad faith actors that might try to get some free cash out of cloud refunds. And other customers can also be good faith actors and included in the assertion.
The post applies to everyone and I’d second it. Ask nicely for a refund in these situations, the worst that can happen is they say no.
Where did they say that “only Troy Hunt shall receive a refund, for only Troy Hunt is a good faith actor, so say we all”?
No snark intended. It's just that I would assume that all customers of such services are in principle good faith actors, not just Troy.
The one thing that is special about Troy is that he is providing a service for the public good but that has nothing to do with being 'good faith' or not.
The response was snarky but it's meaning is true. Troy Hunt will get the refund without any problems because he is a public figure. But if a John Doe will make the same mistake he can only hope someone at AWS/Azure/GCP will lift his fees too - which is not guaranteed!
I think there's a dozen people in this thread who have gotten refunds and not a person saying they were refused?
The point is that if they give refunds to all the good actors they won't make any money.
If you have a long time customer (especially one who brings in as much good publicity as Troy Hunt) and you look at their billing history this spike would be a clear anomaly. Writing off an $8k bill to keep a customer around and happy for years to come is worth more than that bill.
Only if their regular services run at a loss and their business model relies on people making mistakes.
But Azure’s regular prices are definitely high enough that they’re not a loss leader.
> The point is that if they give refunds to all the good actors they won't make any money.
That assumes the only way they make money is on peoples understandable mistakes or lack of care. Doesn't seem to be the case for these services (unlike, say, many gym subscriptions).
It seems far more likely that if they refunded all the well documented issues like this one, their bottom line wouldn't be impacted.
I wonder if before cloud computing, has there ever been a successful product / service where it was accepted with just a shrug that the volatility of monthly costs means it could bankrupt you with next month's bill, because of complexities and opaqueness of the cost structure make it virtually impossible to predict and protect against extreme peaks in all parts of the setup.
Even if you run a relatively opaque cost structure business like a restaurant, you can still calculate the maximum cost of ingredients for one month, the salaries, energy, etc. if you simply use the "best case scenario" of having every seat at every table booked for all opening hours, with people ordering your most sold dishes. Cloud computing is still leagues above that in terms of cost predictability.
I once worked for small, non-startup software company who pondered moving servers to Azure. The Azure partner shop analysed the needs and came up with a monthly cost "between 30k and 120k per month". They were really surprised the company stuck with their non-cloud setup because "everybody is moving into the cloud!!"
> Even if you run a relatively opaque cost structure business like a restaurant, you can still calculate the maximum cost of ingredients for one month, the salaries, energy, etc.
If the restaurant suddenly ordered ten thousand times more ingredients than usual, their supplier would probably call back and say "is that really what you want?" rather than just shrugging and shipping them tonnes of tomatoes with a bill for one billion dollars.
Very true. And in terms of cloud computing, it would mean that alerts and notifications and limits are worth absolutely nothing if it's on the customer to set them up in the correct way for every scenario imaginable. Which is nearly impossible. The tomato supplier's human alerting system is a catch-all-system which would be easily implementable as well.
Yeah - if you look at Troy's graphs they're already calculating an average bandwidth and the alert he's configured has a threshold ~1/50th his current level.
Trying to set a hard number limit ahead of time is hard (estimating how much you'll use, don't want to set a number too low and get cut off plus cloud cost structures can be really hard to get your head around) but that basic level of anomaly detection should be there by default.
> estimating how much you'll use, don't want to set a number too low and get cut off plus cloud cost structures can be really hard to get your head around
Easy way of avoiding this: Don't use shitty hosts that make you pay per GB served and shut you down once you hit your cost limit. Instead get limited by the available bandwidth you have, and clients will just access your server slower rather than being fully denied access.
Who does that, though? I'm including things like 95th percentile in "pay per GB served", but you're painting a pretty broad brush if you class a host as shitty if they won't give you a switch port and not care whether you're sending 2 packets per fortnight or maxing it out.
I'll bet Sysco would deliver $10k worth of canned tomatoes to your restaurant without checking.
Since the tomatoes would be worth $8k (or whatever), they might do a bit more diligence on ensuring the customer can pay.
MS's bandwidth cost a fraction of what they're charging, so it's easy to risk people not paying up.
At my previous housing co-op, the new kitchen manager accidentally ordered nine cases of limes (~$1000) instead of 9 limes.
They assumed it was a mistake and only delivered a single case, which was still 180 limes, but at least it didn't use up our entire food budget.
(Normally I'd expect a phone call or email to confirm, but this was a smaller, local supplier, so they probably didn't have real systems to deal with outliers.)
In this scenario though you’ve used tonnes of tomatoes and they’re now asking you to pay
Tomatoes that were ordered on terms where they’re paid for well after they’re delivered, with a long-running relationship with the vendor. If you went from ordering a few tomatoes to ordering entire lorries full of them, you bet the vendor’s going to check you’re good to pay for them.
Troy Hunt didn’t sneak into an Azure DC and install some hardware any more than this hypothetical restaurateur filled a truck at the local fruit market.
A gas or electric bill works a bit like this... if you have some appliance that fails in a way that suddenly starts consuming much more than usual you can end up with a fairly large bill at the end of the month. Same for old school landlines or cell phones, before flat rate billing became ubiquitous.
Though in those cases the billing isn't really complex or opaque, and you _can_ monitor it if you care to check your meter regularly throughout the month. But, for the electrical case anyway, you can't drill into what exactly is consuming watts without either fancy monitoring equipment or potentially tedious investigation.
The big difference in case of gas or electric (or even telephone) is that you and only you are in control. Someone would have to physically break into your house or steal your phone to rack up a huge bill.
In contrast, with the cloud the bill is directly proportional to the amount of inbound requests from the Internet, with no out of the box way to implement a limit (I guess you could install Apache/Nginx and enforce a limit there, but doesn't that kinda defeat the whole point of the cloud?).
Some cell plans have a charge for incoming texts, roughly like the cloud bill you’re talking about.
And (as sibling comment points out) if you’ve got critical household infrastructure that needs gas/electric, you’re not always in control. An unexpected hard freeze could provide you with a big bill next month and you can’t fully control it.
I agree with you that there’s definitely a difference of scale and asymmetry but the concept of unpredictable postpaid bills is not a new thing.
> A gas or electric bill works a bit like this
Just ask Texans.
I worked on both cloud computing and on premise project. Before cloud computing the risks were different: - much harder to scale. It was much more common to over provision and have machines and bandwidth being unused for years.
- when we were hit with very high traffic due to a bug or something else, most of the time it would lead to customer outages. Based on the contract some times it requires to pay back because SLAs were not reached. Also an outage could lead to customers canceling the subscription.
We swapped one type of problems with another.
If you have a bug that renders your product unusable and refunds are in order, the flexibility of handling traffic peaks which a cloud provider offers won't solve that problem for you. It could even aggravate it. If a show-stopping bug is introduced, it would probably be preferable to fail quickly.
If there was an outage in early 2000, we just went outside to play, or watch tv.
Now if Facebook is down for 15 seconds everyone has heart failure like their life is over.
Amazon uptime is 99.95 which calculates to 43 seconds of downtime everyday.
> It was much more common to over provision and have machines and bandwidth being unused for years.
But the overprovisioned server might still be a lot cheaper than the cloud bill. It can be totally reasonable to have a server running at 1-5% load 98% of the time if you really need the capacity for the remaining 2%.
Also, neither "scaling up" as in "re-deploying the same setup on a beefier instance" nor "scaling out" as in "let's expand to the US and have a server there" is too difficult if the setup is automated (Ansible).
Banking.
Credit card chargebacks, especially.
That's the typical story. Something goes wrong and it costs you (typically a small company) a lot of money. At that time just nobody is looking at metrics. Even alarms don't help absolutely because they can also be missed.
The only thing that would really help were a hard spending limit that stops all services except storage. If your site is important there will be such an amount of user feedback that it is impossible to miss it for a long time.
Alerts can also fail to be timely due to mail/SMS/other delivery issues, or the right people being in the middle of something else. This delay means it is still possible to rack up and unexpected cost.
Or they can fail completely.
And the alerts themselves cost if you want something reliable so you have to weight that against the danger. Pay as you go cloud can be a maze of costing concerns..
> The only thing that would really help were a hard spending limit that stops all services except storage.
Yep. Though that is small comfort if you need to guarantee more than a couple if 9s of uptime, hopefully those with that requirement can soak up the unexpected billing blips.
> The only thing that would really help were a hard spending limit that stops all services except storage.
Sadly, I haven't found a way to do that with AWS
It's funny that even Hetzner can do that and AWS can't. Shows that there's no interest from AWS to prevent these things from happening.
IMO this is something which ought to be written into law. It'd be easy to implement a kill switch, and would actually encourage innovation, as people would feel more empowered to experiment with the technology.
Absolutely, and I would make it a bit broader: Anything that automatically charges a client a variable amount should have a maximum-spend limit that the client can set, and it should default to a reasonable number based on the client's expected usage.
In fact, you could even just change that to any auto-billing service or product and the default for constant-charge services would simply be the amount of the constant charge.
Yes. And the law would simply state that the customer may not be billed more than that maximum for the service period. Up to the business to restrict their use of the service, if they want to.
If it's really so 'complicated' that that part can't be done, well, I'm sure it would suddenly become a business priority to cut off use when the limit is hit, if the business can't charge for it.
To be fair, Hetzner doesn't nickel & dime you on everything.
Some things are legitimately very difficult to meter in a real-time way with adequate performance. Imagine if the S3 server had to do a DB query to lookup your account balance & limit on every HTTP request. That'll completely kill performance and availability (what if the billing DB is down? Should that DB now be replicated across all regions? etc).
The reason a lot of cloud alerts and metrics lag is because billing is done asynchronously by parsing logs way after the actual usage occurred. Of course, the real solution here is to just not bill for things you can't easily measure & limit especially when those things are super expensive, but cloud providers' C-suites have to eat & pay their bills somehow.
*won't
I just looked at my AWS account and there seems to be a way to set budget, attach alerts to it and attach actions to alerts. For example there is an action to stop EC2 instances. Not sure if other AWS services have something similar, but at least you can kill your instances if something weird happens.
Actions weren't there last time I checked (few years ago).
Thank you, I'll check it out
Kill switches in lambda I believe is possible, running when the alert is triggered
Nice, I'll have a look. Thanks!
Most worrying is that even an expert like Troy Hunt was UNABLE to figure out the cause of the issue by himself. He "reached out to a friend at Cloudflare" who investigated and found the cause.
Cloud providers should always have a max spend and it should be a standard feature. The cap shouldn't even be some optional feature or notification service. It should be a hard cap that you can move - at your own risk.
SMB or indie developers are not the first/primary customers for Azure/AWS that they design their application for.
Any enterprise will not want any limits because of spends, they would be lot more pissed if service was pulled because spending cap set by someone sometime in the past is now exceeded. Likely is why such feature is optional not mandatory.
Excess/unexpected billing would be negotiated in typical sales cycle discussions. Making a default hard cap however would result in a lot of senior people are going midnight calls for emergency budget approvals, management would get annoyed by that.
I 100% work for a large enterprise and we would absolutely like spending policies in place. After all, we have fixed OPEX budgets planned in advance of the quarter.
Having a policy is not same as a hard block, you would want spending alerts and escalations and reports, but do you really want a hard block on crossing a limit ?
Admittedly I don't have experience working with very many devops /SRE teams, however I have never seen any enterprise vendor relationships were spend is hard capped for b2b sales with centrally managed procurement.
You know, there are alternatives between a sudden total cut off of service and nothing at all. Some mechanism to prevent sudden, large unexpected bills, yes, that’s desired. Notifications are not enough.
A lot of people have provided tons of examples of follow ups by vendors in this thread where novel large orders are double checked. AWS could provide any manner of preventative policies and cost controls but they choose not to.
Given the choice between blowing 2x past the planned opex and an outage, yeah, there are plenty of applications where the latter is preferable.
One thing I hate about the cloud providers is that there isn't an option to set a maximum cost. I would prefer to plug the cable of my side project than just receive an email saying me that next bill is going to be over my cost. I understand not everyone would like to do that, but I would like to have that option.
Oracle has fantastic budget tools. Not just "you've passed your budget", but "you're forecast to pass your budget in 22 days before the month is up". And you can couple it with quotas to create hard budgets.
AWS has decent tools in this regard, but it pales compared to Oracle. Azure is a product I've never used with any scale (just small projects), but the fact that it actually costs money to setup alerts is gross (and morally reprehensible). Even if it's a trivial amount, that alone just sours the product in my eyes. I mean, already Azure is pretty uncompetitive unless you're running on free credits, as Troy apparently is (purportedly some $13K per year, so unsure what the pitch for donations to cover a bill is about).
This piqued my interest, but a few quick searches (using a search engine--the Oracle Cloud site search only turned up press releases...), indicate that quotas just prevent you from spinning up new instances. That's helpful, but I was hoping for some sort of way to cap my bill (for hobby projects), even if that requries deleting resources.
Oracle Cloud has an enticing free tier, but I'm too afraid to use it because it requires a credit card and I don't see any way to put a monthly cap on my budget. (I'm sure hobby projects with ~$5 - 10/month budgets isn't their target market, but I can dream :)
Edit to add the page I was reading: https://docs.oracle.com/en/cloud/get-started/subscriptions-c...
They'd rather refund small guys for mistakes than give big guys an easy limit to set.
I guess big guys don't want they service to suddenly stop, so they probably would not use this... But it's just a guess
Absolutely that. Storage costs money, so in order to absolutely cap your spending they would have to delete all your stored data, too. Deleting S3 buckets and EBS volumes on a spending blip is absolutely the last thing any company with any budget at all wants to happen, ever. It would be preferable for that not to even be possible in any situation. This is the sort of thing that only extremely small casual users want, and it isn't worth it to AWS to cater to those users. For everyone else, more complexity than a "kill everything at $X" switch is needed, and that's exactly what we do have. We don't get to absolutely cap our spending to the penny but we also don't risk having our data vanish because of a billing issue.
> Storage costs money
The dumb solution for that is to exclude persistent storage from the limit.
The nice solution for that is supporting both "runrate" and "consumption" limits.
Using a runrate limit, spinning up an instance, creating a file, etc. allocates budgets for running it continuously, which is released when shutting it down/deleting it. Hitting the limit prevents new resources from being allocated, but keeps existing ones alive. This should be used for persistent storage and instances used to handle base load.
Using a consumption limit, the resource is shut down when the limit is hit. If the shut-off is delayed, the cloud service eats the overage, since they control the delay. This should be used for bandwidth, paid api-calls, and auto-scaling instances.
The user should be able to create multiple limits of each kind, and assign different services to such limits. Alerts when going near the limit can help the user raise it, if that's their intention.
For consumption, it might also make sense to have rate limiters, which throttle after a burst budget is exceeded, similar to how compute works on T instances on AWS. But those probably only make sense for individual services, not globally (e.g. throttle an instance to 100 Mbit/s after it exhausted its 5 TB/day bandwidth allocation, or throttle an API to x calls/s).
I assume the sensible implementation would be cut off access and give you some period to settle your bill before the data is deleted.
For background batch jobs and analytics etc. they might want caps. Say something like a video transcoding workload. And lots of things could benefit not from a cap, but some kind of gradual degradation in bandwidth/instance allocation + a warning so you can raise the limits, it doesn't have to just shut everything down immediately using a hard cap.
I assume you meant “pull” or “unplug” the cable :)
Yes ;)
But there is an option. In Azure you can "set a budget". He even goes over it in the post. Did you read the linked article?
It would be really classy if MS forgave that debt, especially considering the service is a public benefit.
I would go as far as saying that the hosting for such a service should be entirely sponsored by Microsoft.
He's a "Microsoft Regional Director and MVP" so Microsoft pays the bill one way or another. I expect that he has reduced Azure rates as well.
Regional Directors and MVPs aren't employed by Microsoft: https://rd.microsoft.com/en-us/about/
I have a monthly azure credit of $150 and some reduced pricing simply by having a ms developer subscription. I'm guessing Microsoft MVP's (in general, and Azure MVPs perhaps in particular) have extremely generous azure credits so hopefully he isn't on the line for the full amount here.
Would be even classier if the major cloud providers responded to customers calling out for budget limits for the past decade. Not many people want to risk potentially infinite costs.
I wonder how much of the cloud provider revenue comes from situations like this. I suspect quite a lot.
I think that the cloud provider business model that allows for uncapped maximum costs is a bit of a commercial dark pattern. What makes it somewhat more nefarious is that it is relatively easy to blame the customer.
I’m not surprised that the cloud providers are quick to refund users as it’s likely that they only do it in a fraction of cases and it buys a lot of goodwill.
It would be interesting to try and design a cloud that supports OutOfMoneyException’s with gradual degradation and capped liability for costs built in.
> I suspect quite a lot.
I don't actually believe so. Cloud providers are known to refund bills incurred by mistake. They make so much margins on legitimate usage by big companies & startups that it's just not worth burning developer goodwill & potentially waste efforts trying to collect a bill the customer legitimately can't pay (and will guarantee he will never use nor advocate for your service again).
Question, is the 0.014AUD per GB quoted here correct? Looking at the linked page[1] I would think the cost would be 0.1102AUD per GB as is quoted in the Internet egress section.
https://azure.microsoft.com/en-au/pricing/details/bandwidth/
Also (3200 GB per day * 30 days) * 0.014 AUD per GB is 1344 AUD. While (3200 GB per day * 30 days * 0.1102 AUD per GB) is 10579.2 AUD much closer to the final bill.
My conclusion Troy still doesn't know how much he is paying.
It clearly isn't. It looks like he confusing transfers between availability zones in one region with egress to the internet. A factor 10 mistake like that should be obvious, but he didn't fix it, even after I pointed it out in the comments on his blog (he responded that the price for me might be different due to region/currency settings).
Interestingly, Troy says that egress is expensive on Azure at $0.014 AUD/gB (~$0.010 USD/gB), but that is the same price as additional egress for Linode and DO, and Linode egress has never struck me as expensive. In fact, I’m kind of shocked (as an AWS user) that Azure egress is the same price as Linode.
Actually, wow it seems AWS is also the same price as Linode and DO for egress. While Linodes and DO do come with decent free bandwidth, this is a surprise to me.
You’ve interpreted the numbers wrong. Yes, Linode, DigitalOcean, and most of this class of providers charge $0.01/GB. Almost literally an order of magnitude less than Azure or AWS. The megaclouds massively overcharge for bandwidth. It’s not even close.
AWS charges $0.09/GB, and Azure charges $0.0875/GB.
Maybe Troy Hunt gets a discount for being a Microsoft Regional Director and MVP. (Neither of which make him an employee of Microsoft, confusingly enough.)
https://docs.digitalocean.com/products/billing/bandwidth/
https://www.linode.com/docs/guides/network-transfer/
https://aws.amazon.com/ec2/pricing/on-demand/
https://azure.microsoft.com/en-us/pricing/details/bandwidth/
Ah weird. I was on some AWS page that said the cost was $0.01/gB, which threw me off as seeking ridiculous compared to what I remembered. Not sure what page it was, but clearly that is not the actual pricing.
I think the article is incorrect.
https://azure.microsoft.com/en-au/pricing/details/bandwidth/...
The AUD $0.014/GB is only for data transfer between Availability Zones.
How can $10 per TB not strike you as expensive? You can easily download that much a day on consumer broadband that will cost you far less than $10/day.
If you download the data twice at that price point, you could buy an HDD to store it for the same price (the bigger HDDs seem to be at ~ 18 EUR per TB here).
$10/TB is between availability zones in the same region. Egress to the internet costs $50-$90. So it's much more expensive than the already expensive $10.
Reminds me of a time, we had a new site that was going to run on GCP, we had been using a couple co-located servers for years.
When everything was moved to production, URL went live, nobody ever did any kind of bandwidth checking, caching, no CDN, no cost tracking. $10,000 in our first week. That's about 1/4 what our total spend on the co-located servers was for the whole year. Boss flipped his lid and wanted to kill the new guy who was on the project.
After about 2 years we got rid of all the co-located stuff and were spending about 1.5x, but we had more apps, they served heavier pages, etc.
1.5x is pretty good.
We overspent quite heavily on our on-prem stuff for a game I helped launch, for political reasons the next game ended up running on the cloud.
The price was roughly 10x before discounts. With our heavy discounts and a wide amount of slimming down/cost optimisation (easily 3 months of work) we got it to 2.3x
There will always be a need for sysadmins/cloudops/devops for that environment, so we didn't save any headcount either.
I can't imagine getting anywhere close to parity in costs, Functions-as-a-service ended up costing more than compute instances too so we went back to compute instances in places where we thought we'd get away from it.
That said, it was a lot nicer to use!
Awful toxic boss.
It is very good these things are getting publicized. More and more people realize these payment schemes for what they are: a scam. Every cloud provider that refuse to put a hard spending limit participates in this.
It is important to remember that not all cloud providers participate in it. For example, in Hetzner Cloud, they explicitly provide the maximum amount you are going to pay for a given instance or service in a given month. You are guaranteed not to pay more. Everybody knows why Amazon etc. refuses to do it this way.
On Hetzner and with their €1.00 per TB after 20TB included, you can pay up to €324 per vps as you are limited to 1Gbps if you fully saturate the link all month.
I doubt you'll manage to get the exact 1Gbps per VPS out all month. On dedicated that's more likely. But luckily they have a very easy setting for billing alerts and maximum in the settings page.
Hetzner Cloud(!) only has 20TB/Month included in the monthly costs and states that you have to pay for any additional traffic. I never reached that on one of their cloud boxes so I don't know how it looks like but it definitely isn't all up front. But yes the dedicated machines come with no additional traffic charges whatsoever
Additional traffic costs 1 EUR/TB (plus VAT, depending on where you live). So it's about 50 times cheaper than the big clouds.
My (naive) solution. Every new account by default has an SMS alert that trips at $100. It says
"Your account has exceed $100 spend. Reply 'SHUTDOWN' to shutdown all services, 'STOP ALERTS' to never see this alert again, or 'DOUBLE TRIGGER' to double the alert trigger value to $200."
$100 is arbitrary, it could be any nominal sum. The idea being that the user can double the alert each time they get it just from SMS. I bet 95% of users would double their alert limit to a comfortable point. The other ~5% will be power users who customize their alerts.
The idea that these companies couldn't know what limits customers want is kinda silly. We can use the same techniques for alerts that we use in algorithms for expanding vector storage, for example. We can "amortize" alerts, so to speak.
The problem is that metering these services at such granularity is difficult: https://news.ycombinator.com/item?id=30066538
It doesn't need to be very accurate. As long as the values are the same order of magnitude it is probably okay.
Very nice writeup, thanks to the author for writing it so clearly for someone who is not familiar with the nitty-gritty to be able to follow it.
Shameless plug - the core of my work is about ensuring these unexpected costs never happen.
We have some recent case studies where we've successfully reduced cloud costs by 95%
https://www.cloudexpat.com/case-studies/
hi(at)cloudexpat.com - happy to help!
Out of curiosity, do you merely optimize existing cloud usage or do you help your clients move to hybrid/bare-metal?
As soon as I saw "17GB file" i thought "that's what torrents are for". Otherwise one mistake and... Well this happens.
Or someone maliciously bypasses CF cache e.g. by parameters.
Cloud just is not suitable for any kind of volume egress. It's a death trap. Like going on vacation with data roaming enabled.
Yeah, HIBP is using torrents:
> I removed the direct download links from the HIBP website and just left the torrents which had plenty of seeds so it was still easy to get the data. Since then, Cloudflare upped that 15GB limit and I've restored the links for folks that aren't in a position to pull down a torrent. Crisis over.
I know, I read the article.
But I feel like Dr Strangelove here. Of course, the whole point of a torrent on a cloud service is lost if you also provide a raw download link.
Also providing a download link is tempting, but can easily cost (for a 17GB file and growing) up to US $3 per click.
Even off of their premium global network it's over $2 per click. The cheapest in Microsofts entire egress table would be $0.68 per click. (but that only kicks in after you've spent way more than $9400 in cheaper tiers in a given month)
Egress kills you, in cloud. "Oh, cloudflare probably caches most of this" is not something I'd recommend.
And then Cloudflare will not cache it at some locations for random reasons and the cloud bill is back. Anyone with technical knowledge should have no problem routing static files via machines at OVH/Hetzner and the like, no reason to enter such risks for maybe an hour of setup time saved.
Or Hetzner server auction to get a cheap 20/30€ machine with unlimited traffic at 1Gbps. Setup time is max 1h even if you do it manually, with cloudflare Tunnel it's also really easy to lock down everything with a firewall and have minimal exposure to threats.
> Setup time is max 1h even if you do it manually
- Patching - Remediation, Monitoring, day0 response
- Security Information and Event Management - exports, alerts, OS configuration
- OS/Application Hardening - Encryption, Password/keys rotation, CIS/other baselines, Drift Management
- Backup - Encryption, (don't forget your passwords/keys are changing), retention, data protection compliance, monitoring, alerting, test days
- High Availability - replication, synchronisation, monitoring, alerts, test days
This is just the tip of the ice berg, if you operate in an environment where Insurance, Reputation, Regulatory Compliance, etc.. are important, then it's easy to see why PAAS solutions are desirable.
I have 10gbits internet at home. Sometimes I wonder how many services/people I could bankrupt by using it harder. Not that I want this, but more like, why is it even possible?
> I have been, and still remain, a massive proponent of "the cloud".
Mice cried and stung themselves, but kept eating the cactus.
This particular problem basically boils down to "CDN providers don't like caching large files", which is a very common problem. Everything else was configured and setup exactly right to not have a large bill.
Most CDN providers have a lot of machines out on the edges of their networks, and it's understandable that they don't stuff these machines with large disks, likely preferring smaller faster SSDs. But this is a very common pitfall of CDNs that needs more attention, along with messaging on the dashboards and settings pages.
I've had problems with no warning on Cloudfront, Cloudflare, Bunny.net all from not realising that my files were beyond the CDN's cache size limit, but none of them seem to do a good job at surfacing this other than "talk to customer support".
Cloudfront does list the max size clearly in the limits and quotas page, though, and if you front your S3 bucket with Cloudfront, you could turn caching off and still get the discounted bandwidth out rates (S3 -> Cloudfront is always free, even if the file is fetched every time).
Cloudfront isn't much discounted bandwidth out compared to S3 though, is it?
I see S3 is initial $0.09/GB, going down to $0.07 after 50TB or $0.05 after 150TB.
Cloudfront North America is $0.085 for first 10TB; but $0.110 and up for other regions. going down to $0.060 north america after 100TB, and okay $0.025 after 1PB. (but $0.050 and up in other regions even after 1PB).
So okay, Cloudfront gets cheaper egress at large scale, I guess. By about 50% though, not an order of magnitude, and could be much less depending on region.
The reserved capacity pricing is lower, in a business setting your account manager will usually suggest this pretty quickly if you have a steady and/or increasing Cloudfront bill.
Oh I didn't even know about that, thanks! Something else for me to look into.
This is why I use fixed price offerings for personal projects.
A large bill is probably chump change for someone like Troy, for others it's a year or two of savings. The risk is not worth it.
Would you mind sharing the services you’ve found that have fixed prices? I haven’t had much luck finding services like that (although I’m looking in the < $20/month range).
For fixed price and fixed performance you can use bare metal providers with unmetered bandwidth generally tier 2 vendors offer that.
At $20 bare metal is not easily possible, the lowest prices I have seen are usually 40-50 and above. Howveve you can get a VPS with unmetered bandwidth and no other costs at your price range [1]. The price is still fixed some performance variances may be there, at $20 minor variances are unavoidable.
Thanks! I had not heard of OVHcloud before. That looks like what I want.
Of tier 2 vendors hetzner , OVH, linode and scaleway are all pretty reliable
>What we're talking about here is egress bandwidth for data being sent out of Microsoft's Azure infrastructure (priced at AU$0.014 per GB).
AUD $0.014 is roughly USD $0.01. Which I thought was reasonable. But on [1] only "Data transfer between Availability Zones(Egress and Ingress)" cost $0.01. Do transferring from Azure to CF count as that? Other Internet egress (routed via Routing preference transit ISP network) starts at $0.08
I hope someone from Azure CS could give him a custom discount.
It is also worth thinking, the cost HIBP saved on Cloud / Serverless over the years could have wiped out ( if not more ) by this single incident.
[1] https://azure.microsoft.com/en-au/pricing/details/bandwidth/...
Cloudflare and Azure have a "Bandwidth Alliance" peering which - if you correctly set up your Azure resources to use "Internet Routing" - will result in a modest discount. It is a bit of a scam though as it is marketed as though you'll get 100% discount but in reality it is more like 15% off. I think GCP is 100% though.
Definitely not 100%, more like 66% off: https://cloud.google.com/network-connectivity/docs/cdn-inter...
We've been thinking about this for a while, and if there is any way we can catch these types of cost spikes before they happen. We've managed to do it for Terraform resources using an estimation approach, and using a usage file, you can model expected usage-based resources (https://github.com/infracost/infracost/blob/master/infracost...), but this one has got us thinking more about policies.
To be clear - we would not have been able to catch this one right now :'(
Would love to hear thoughts / brainstorm ideas - is there any way we can proactively catch these types of cost spikes?
I think this is fundamental to on-demand services. Anything outside terraform or another configuration file system is hard to reason about. If cloudflare is in your config system, then you could put up a warning that files bigger than whatever won’t get cached, but that still assumes a level of knowledge about the system that you don’t generally have.
Setting up limits and alerts as part of the system creation is usually the best strategy.
I like that, maybe we have to build up a knowledge base of wisdom (probably learnt through the hard way), and warn if the conditions are met or at least a list of the things to note. Then the cloud cost alert being a fallback safety net.
One wonders how Cloudflare can essentially absorb all bandwidth costs. But AWS and Azure are using them as a profit center.
On the cloud providers, you are paying for your usage (yes, marked up, but they have costs too).
Cloudflare has the same model, but they distribute the costs. The vast majority of people never use anywhere close to their share, so they subsidize the outliers and the free tier.
Lots of peering. They pay $0 for roughly half of their egress.
https://blog.cloudflare.com/the-relative-cost-of-bandwidth-a...
Well, the cloud is just a convenient way of accessing someone else's server.
Convenience always costs money, there is no (big) cloud provider doing it out of their own pocket or rather not optimizing for huge profits.
It's the same as with any other service, really. So I don't understand, why some people assume it would be different here.
(Note: I am not saying that Troy Hunt assumed this, but I know people who go to the cloud because "It's cheaper". It was never cheaper, on no project I worked on. It was more convenient, but in the end it was more expensive mostly)
I would be surprised if Azure doesn't waive or reduce this bill dramatically. Something similar happened to me with AWS. I had a simple file upload service where files would expire if they hadn't been accessed in 24 hours. Someone started using it to upload music and videos. I ended up with a high bandwidth bill on Amazon S3. I reached out and explained what happened, they waived the costs entirely (to the tune of $5000).
Valuable investigation steps to find the erring cloud resource, But as Troy concludes 'Budget Alerts' would have saved him from this issue.
No matter what the traffic is, The first thing to do with any cloud service provider is to set the budget alerts according to our wallet, be it one with credits or otherwise. At this point, I don't even try any new cloud service provider who doesn't offer credible budget alerts.
Another key takeaway is,
> Huh, no "CacheControl" value. But there wasn't one on any of the previous zip files either and the Cloudflare page rule above should be overriding anything here by virtue of the edge cache TTL setting anyway.
Even this could blow up. All cloud service providers set the "CacheControl" to "No" and if we would want to cache something which is not cached by CF by default e.g. *html using Page Rules then we need to set CacheControl (e.g. max-age) at the cloud service provider end too.
P.S. I've written about these recently on my blog titled 'Saving Cloud Costs'[1] from a frugal solopreneur PoV.
This is why I personally won't run projects on infrastructure with what roughly equates to unlimited risk billing.
It's my opinion that it's better to work with known limitations and optimize for them.
In the case of bandwidth, work with a fixed pipe size, or do the math and set up a QoS that implements a throttle to avoid exceeding your bandwidth allotment.
First thing one should always set on any cloud account is billing alerts. Set > 1 and set first to ~ 80% of what you think will be your normal cost then add extra alerts all the way up to 100%. That way you'll usually get an early warning with some time to act before it becomes really expensive.
Everything can be going fine for a long time, and then cloud costs kill your business.
This happened to Murfie a couple of years ago, and that's why I had to step in to try to fix things. I'm still trying, and there are still challenges, but I won't allow landlords and cloud costs to disrupt things again.
Think about how many big companies struggle with his. Most don’t have one person who can think through the cost of the cloud, as well as the activities to manage the costs. Many even say “Let engineers be engineers, and business people own the costs.” And all of a sudden you get a ton of surprises…
If Microsoft doesn't show the decency to forgive that bill, i'd be happy to chip in!
Cloud providers should really start protecting customers from these spikes. Alerts are not enough, there should also be hard caps (stop serving) and soft caps (serve at reduced speed/capacity) based on configured max budgets.
If you are not a VC backed corporation you must be insane to run anything on a "cloud". Why not rent a dedicated server from OVH or others where you can actually control costs and pay 10-100 times less?
Because experience getting shit done using boring tools doesn't translate well to a future career in a VC-backend company wrangling Terraform & YAML files.
Seems at least a little unethical that cloud companies do pay as you go up to infinity, instead of some model where you transfer money in and if you use it all up your service gets cut.
There'd be value in a model which allowed you to pay up to some limit then switch into a user-pays model if the user wanted the service right now.
As I spent a few hours to successfully get cf cache b2 files. I'm curious about the part of support Cloudflare requests due to caching issues.
It's time for cf to work a bit on its UX
>This was about AU$350 a day for a month. It really hurt, and it shouldn't have happened. I should have picked up on it earlier and had safeguards in place to ensure it didn't happen. It's on me.
Uh no - it's on cloudflare and azure. Why don't they have a global setting that says Max Charges Per Month: $X and it just shuts down when it hits that number? This is why I don't really like using big cloud services like this.
This prompted me to go and check my custom static site generator (which renders my blog onto an Azure storage account exposed via HTTP and Cloudflare).
Turns out I wasn't setting x-ms-cache-control when writing all the blobs, so that's a win right there.
(interestingly, it appears that rclone, which I was in the process of moving to, doesn't do that, so I might have to keep my custom Azure storage library around)
Shouldn't lookups be where cdb shines? Hold my beer:
$ shard="$(echo "${sha1}" | cut -c 1)"
$ cdb -q pwned-passwords-v8-sha1-${shard}.cdb "${sha1}"
But as a cloud evangelist at Microsoft, you may sing the corporate IT gospel anyway.
¹https://mro.name/agakdfaWell ... it's not like it was the first time this happened to a software developer.
He should have known better that there is a risk, that you don't know some detail that costs you a lot of money.
Cloud Bandwidth is soooooooooo expensive. If there is a risk that you have to pay this, please us a provider like Hetzner with fixed costs. If you like your serverless things, just host the big files at Hetzner.
> I always knew bandwidth on Azure was expensive and I should have been monitoring it better
It's suspicious that cloud providers STILL don't have any sort of "circuit breaker" infrastructure for this sort of thing - yes, you can set up alerts, but you can't say, "shut the whole thing down before the costs go above a certain threshold".
I guess all Microsoft PR and Marketing departments are now on the phone trying to get this guy a refund and take down this post :)
This guy is a Microsoft Regional Director he is part of the Microsoft PR engine.
> I, uh, have a bill I need to pay
Kind of sad that service we are accustomed to using, various software integrates it (whether using HIBP API or downloaded pwned passwords archive) - is on a shoulder of single guy that now has to pay for his mistake.
Great that Cloudflare helps him with the service, otherwise who knows if we had access to HIBP at this scale?
Hope it is okay and not too much off-topic. I just donated. He deserves it for this service!
Fact is that stuff like this can happen. Consider how many variables are in play to determine the final cost of a cloud service it is very much a double-edged sword. Sometimes you cut yourself unintentionally.
So now we all learn from this, I suggest we help him out.
Looking forward for the followup post in early 2033 when he forgets to extend the cost alert expiration.
This is a big trap to fall in to. I dont understand why network trafficking is so expencive also in AWS. I once had a 2k monthly bill purely from networking because i accidentally routed a lot of requests through a NAT. That hurt haha. Now i stay away from those things :D
> But these would always cache at the Cloudflare edge node, that's why I could provide the service for free, and I'd done a bunch of work with the folks there to make sure the bandwidth from the origin service was negligible.
If you're not Troy Hunt or another celebrity with special access to Cloudflare -- I don't think you really have access to Cloudflare to do a lot of work with you to ensure that your data gets cached and your egress is minimal, for large files on a very cheap cloudflare plan. (Based on the costs reported by Hunt as catastrophic, I don't think he's paying cloudflare for a large enterprise plan)
(Also, it's unclear if caching large data like this is even within the ToS of Cloudflare?)
I don't think Cloudflare promises to cache any particular URLs for any particular amounts of time (except no greater than cache headers etc; but they don't promise never to evict from cache sooner; they evict LRU according to their own policies). Cloudflare's marketed purposes include globally distributed performance, and security. I don't think they include "saving egress charges by long-term caching your data".
I have a much smaller project, but egress charges for data are an increasingly large part of my budget. I've been trying to figure out what if anything can be done about it. I wish I had a guaranteed way to get ultra-long-cache promise-to-be-within-ToS for very large data files from Cloudflare for a affordable fixed-rate price. (Maybe I do? But just haven't reassured myself of it yet?)
> In desperation, I reached out to a friend at Cloudflare… I recalled a discussion years earlier where Cloudflare had upped the cacheable size… Since then, Cloudflare upped that 15GB limit…
Since I'm looking for solutions for this same problem (delivering lots of data at very cheap prices), I am finding myself a bit annoyed that Hunt is talking about how he solved it, using tools/price-levels not available to most of us who don't have his level of access due to position.
Interestingly, MSN/Azure is part of the "Bandwidth Alliance" with cloudflare, which initially one thinks means there are no egress charges when delivering to cloudflare. (That is what it means for some other alliance members like backblaze). But that's clearly not the case or this story wouldn't happen, right? Turns out Azure gives you a fairly small egress discount when delivering to cloudflare, and only if you set things up in a non-standard way.
First thing i do is set an alert when costs go over 10$ for any new project. Highly recommend
Do you also make sure you never go on vacation, never go anywhere that doesn't have a phone signal, never turn off your phone, that your alerts have multiple levels of redundancy, and that you always have access to a computer to modify settings?
Clouds are good for quick start and fast grow. But after this phase, you should think about "classic" hosting solutions (multiserver, load balancer, etc.), they could be much cheaper.
as long as your human admin costs are lower then cloud services
It's unconscionable that MS doesn't have warning notifications in place BY DEFAULT, so when you start incurring charges e.g. 10x of normal, you get notified immediately. One shouldn't have to set these up manually ever.
It seems like everyone is blaming azure when this was an issue with CloudFlare…
I get that everyone has an obsession with dirt cheap providers instead of cloud solutions like aws/azure. But that doesn’t mean it’s better. Everything has pros and cons.
I'm sure some cloud providers have it, but they all should have a global, "If my account hits $XXX shut it all down immediately and email me" flag. And yes, that's kind of what he did here, I get that.
I wonder if people will start to make shell companies to just go brankrupt when this happens and start afresh with another company. The cloud vendor doesn't look too closely ehat you are running right? So this could work.
Most of the clouds have functionalities to manage this. In AWS for example you can create an alarm with AWS Budget to monitor costs by tools/service/etc. Using a complex cloud without using this is not good practice.
It is good thing to know that this could happen to anyone. I guess that setting limits and alters should be one of the first things that one should do.
What would happen if a credit card limit was exceeded, a site would just stop working?
Yeah the problem with Cloud vendors is that if they make a mistake, it will usually disadvantage the customer...not them. I'm a little biased as I don't completely buy into the whole Cloud paradigm.
Cloud seems like a pet tiger - really cool and fun, until it turns on you.
Enjoyed the article.
But still, couldn't help to get the following lasting impression after reading it: these days being able to click around the UIs of the cloud providers should be a billable skill by itself.
This guy needs to clean up his bio. There seems to be a lot of confusion on whether or not he works for Microsoft when it appears that he is a uhh... reverse pay midlevel manager inter?
These things really should have a AI like alert that is basically “cost is departing dramatically from historical pattern” without the need to set thresholds and the like
Are there cloud services that allow to easily put a maximum budget, to make sure you have no surprise costs like that?
In my experience you can only setup billing alerts, which are fair, if you ask me.
I took a good course on pluralsight about AWS and the first lesson was to setup a billing alert.
What will hard limits will do to your infra? You can't take down / suspend DBs, EC2s, etc... Just because you set a 1k USD limit and that's it.
Alerts are the 1st thing you should setup IMHO
> You can't take down / suspend DBs, EC2s, etc... Just because you set a 1k USD limit and that's it.
You (the cloud provider) can shut down VMs, block access to all services, and just retain the content in storage until the bill is resolved or the account is permanently closed. The cost would be trivial as storage is dirt cheap.
Sure, but will they do that? It's easier to just charge people. :P
AFAIK Heroku shuts down your stuff if your Dynos are overspending :P
Google App Engine allows you to set up hard spending caps, after which your application will start returning 503s
It would be good if he contacts Microsoft about this. Sometimes they will give credits for situations such as this.
He is Troy hunt and an ms MVP, as soon as ms gets wind, they'd be the one to contact him
Happily donated to Troy. He's done more than most to help everyday folks weather these data breaches.
My issue with this is that the donation is basically to Microsoft for their dark patterns. There's no way this traffic cost much to Microsoft, so it all is added profit for their shareholders. Other providers would've provided the same service and bandwidth for a much lower price.
I really appreciate the work that Troy is doing, but seeing much needed money ending up and Microsoft or Amazon leaves a bitter taste. I hope at some point it will become cool again to just rent a VM or dedicated server for small projects and stop throwing so much money at the already richest people in the world.
Unfortunately, data in Aus really costs this much (more actually), from my experience colocating in a few data centres (I was typically paying $0.3/GB). It’s certainly possible it cost them less, but very doubtful on it being close to free.
EDIT: Apparently it was hosted out of US West, so I agree that the actual data cost would probably be a lot less.
I don't understand it. Does a cloudflare edge server sit inside Azure?
No. Cloudflare is configured as a reverse proxy in front of the site. So traffic reaches the Cloudflare edge first, then it is proxied to the origin on Azure unless the file is served directly from the Cloudflare cache.
close account, cancel card and move on with life before they charge you.
If anything, this shows the insane scalability of the cloud
outbound transfer cost is one of the most expensive things in cloud computing, it's much better when you can pay for allocated bandwith.
Just avoid cloud and choose dedicated infrastructure
Didn't Troy sell HIBP to Verizon?
Donated! Hope it helps
TL;DR: I got a big bill from my cloud provider, so I used more cloud provider features, to make sure I know before I get the bill; isn't my cloud provider great?
Can somebody explain to me why I wouldn't just rent a 40 EUR dedicated server from Hetzner with unlimited traffic and gigabit uplink? His 600GB/day is way less than what you get over a gigabit link within a day. Sure, sudden bursts would perhaps "throttle" at a gigabit, but according to his article that was only the cloudflare proxy anyhow, so no pain in having that take a few seconds longer.
As far as I am concerned, I just don't understand why people use cloud services.
He is a Microsoft MVP. A title that is given for being a "community evangelist" of Microsoft. You wouldn't get that throwing it on a Heztner machine.
Edit: Consider this article, and Geoff's statement about Azure credits.
https://www.theregister.com/2021/04/21/microsoft_revokes_mvp...
Sounds like a pretty expensive privilege.
How is using cloudflare okay in this then? Cloudflare is also not Azure
The simple answer here is that Troy was using Cloudflare to offset costs he knew he would incur with Azure. He states verbatim:
"Firstly, I always knew bandwidth on Azure was expensive and I should have been monitoring it better, particularly on the storage account serving the most data."
...and he didn't have simple monitors in place to alert him of uncommon billing spikes.
I get your point, if he's not OK with using Hetzner how is Cloudflare any better? It's not. But the reality is Cloud operations are a fine dance of weaving services together to realize all of the heavily advertised savings. I'd argue that a lot of Troy's projects that use all of the cloud native functions could have also been implemented on much more standard stacks and, likely, been just as cost and performance effective. But that's not going to get him the advertising for Microsoft.
There are no savings with cloud, weaving or not.
You want to waste money? Hire a car, with a driver, when you need it.
Want to save money. Learn to drive.
You always pay more for outsourcing stuff, a lot more, than doing it yourself.
You can buy 1000x the processing power, by buying baremetal. You can get 100,000x more bandwith for cost, when not using the cloud.
People think baremetal is hard. It isn't. It does take knowledge.
>You want to waste money? Hire a car, with a driver, when you need it.
> Want to save money. Learn to drive.
Oh please. As if learning to drive is the end of expenses. If you finance a car, you have monthly payments. If you don't, then you have periodic recurring maintenance bills. You always have fuel charges. You always have insurance charges. You periodically have parking charges.
I know how to drive, but do not own a car. From time to time, I hire a car, but it no where gets close to costing me the amount of owning a car would.
This is probably true in the states where it is insanely cheap to rent cars, but not necessarily everywhere. And even there...
... I run a junker. That is to say a car that will go the dump as soon as it requires any significant expenditure, and the combination saving of not having to finance it, and most years minimal or no repairs, and only needing third party insurance makes it significantly cheaper than renting.
depends on how frequently you need a car. I drive maybe 3 times a year. that's ~2k in gas/rent fees. that's less than insurance assuming each rental is for a week. never mind the cost of actually buying the car.
in practice I spend maybe 1k every year for cars. primarily for vacation. which owning a car wouldn't absolve me from spending.
very well put, i also rent cars because i rarely need them.
but i think where this analogy breaks down is that if i run a service, no matter how many users, peaky or not, at least 1 server always needs to be on, not "from time to time".
> People think baremetal is hard. It isn't. It does take knowledge.
This.
I always wonder how much of the "clouds" success (economic, that is) would have materialized, if the marketing term never got traction, and everyone just called it what it really is: "renting someone elses hardware without physical access, and less, if any, control over how the stack works from the metal up".
In the good 'ol days, when people wanted to put a service online, they rented the racks at a colo, and either stuffed their own hardware in or, worst case, used rented hardware.
Did that require some basic familiarity with hardware? Yes it did. Did people need to know how to setup, configure and administrate a LAMP stack? Sure. Was it guarded against sudden loadspikes by god-knows-how-many layers of abstraction? Nope.
But it worked, and surprise, in 99% of cases, it was perfectly fine if a website ran at sub-optimal speed for a few hours, or went down every now and then.
And the dirty little secret is: It still does, and it still is.
No, no. The costs cloud saves are in staffing and opportunity costs. Everyone knows that it is more expensive than a comparable server, but...it is easy, standard, and available. If you want to a) not have real estate capex, b) not worry about the core ops part of your applications, and c) used to outsource infrastructure to a managed service provider anyway, then Cloud is a viable value prop. Plus, the more of the services you use, the more you app stack becomes "standard" parts with glue code. This is maybe an improvement at the large enterprise scale where home-built apps don't have a reputation for being future proof.
And better credibility when you say “Our vulnerability was on AWS and configuration is hard, but at least we had the default VPC config” rather than “We maintained our own stack and being sysadmin is hard, and the port was exposed on the web.”
Modern cloud services such as S3 or let's say MongoDB seem to have a lot more security footguns than old-school bare-metal. An S3 bucket misconfiguration exposes your data to everyone even if there was never a reason for that data to ever be exposed to the outside world. On a bare-metal, chuck it in a directory outside your web root and someone will have to actually breach the server before they can steal the data.
> but...it is easy, standard, and available.
So is a LAMP stack on a dedicated machine.
> Plus, the more of the services you use,
The thing is, most webapps don't use a lot of services. Backend-Logic in whateverlanguage, a database, and a webserver. Maybe hooked up to some CRM system. That's it for 99/100 webservices.
Yes, the services cloud providers offer are amazing, they are complex, and it is natural for developers to be fascinated by complex things (I know it is for me). But it's important to realize when simple is simply enough.
You're correct on staffing savings, but not on Ops savings.
I delved into this pretty thoroughly last month - https://medium.com/@rykrk/everything-is-just-build-vs-buy-d7...
Except in what minute of my day am I supposed to take off the hat I'm currently wearing to put on my IT Server Room hat? I don't have time to wrangle this stuff any more. I have multiple clients, I have side hustles, I have what's left of a social life after pandemic, I have family obligations. There are only so many hours in a day. If my time become more effecient by throwing a bit of money at the problem, then it is worth it to pay "experts" at something to relieve me of the burden.
There is a difference between freelancers/one person companies and big cooperations though.
Honestly, the question you need to ask in regards to cloud is a relatively simple one: Can I hire a sysadmin for cheaper than using the cloud?
The answer to that, once you start using enough resources, is more often than not yes.
Sure, it takes a while to get to that point, but eventually you will reach break even and it would be cheaper to do it yourself/have your employee do it.
Yes, but these threads of "in house is cheaper than cloud" never qualify at what size company, at what revenue being generated, etc before their version of an answer is true.
I have been on both sides. Large media production companies with very large amounts of fast and redundant storage located on-prem. These range from local attached RAIDs to large shared SAN pools. Their clients also tend to be the types that sue the crap out of you if any of their content is seen by people outside their control. Switching to cloud solutions was (still is) a huge uphill battle. However, the cloud storage needs are no where near the same (not editing content from s3), but storing approved masters for distribution totally makes sense for cloud. Now that the content is in the cloud, why not perform actions on that content in the cloud. Faster deployment, better equipment, blah blah. Next thing you know your entire workflow past editorial is in the cloud. You start to analyze your expenses and compare them to on-prem amortized expenses and you see that it could be cheaper on-prem. Also, take into consideration how long it takes to bring up that new data center. You also have to look at bandwidth expenses. Bandwidth to a new site not directly on the backbone tends to be expensive for non-residential connections. The additional power expenses of that new equipment plus the cooling is also a new expense. Power redundancy you ask? $$$ Now, you need that sysadmin and possibly a small team. At that point, you go back to your cloud rep, and renegotiate fees. You have now created an entirely new department at your company on managing the on-prem.
even worse, i bet a lot are in the situation of "i have cloud AND i need to hire a system administrator".
this is something easily left out of discussions. it doesn't matter if the equipment is in the cloud or on-prem. someone still needs to be able to manage it all. whether they phsically install new hardware or push a button on a UI to bring up a new machine, it is still needed to be done and managed.
Re: Cloud. Not all cloud scenarios are the same. If the cost is amortized over a long time (theoretically infinite, well that's the plan) then the immediate convenience can outweigh the cost/opportunity cost. For example if you used Backblaze to backup one personal computer at the cost of $6/mo, if you have a lot of data that becomes a huge source of savings compared to managing the backups yourself. At that price the ROI versus other methods like building a trueNAS may not be within a decade, and I'd argue the storage enthusiasts have probably refreshed all their drives within that time and the ROI would never come even if Backblaze doubled their prices. What you do get is that self hosting becomes a hobby, and that's what I feel it is for most people.
Hiring a personal car is more expensive because you are hiring a personal employee.
That said, I still argue for personal autonomy alone learning to do the thing is better in general, but I don't think it's because it's cheaper in all scenarios. And to your point some or maybe even most cloud services are more expensive relative to their self hosted versions.
There are savings, but they require work to realize.
Let's use your driving example (because car examples are always great!)...
>You want to waste money? Hire a car, with a driver, when you need it.
>Want to save money. Learn to drive.
This is true. You can save more money if you need to drive often if you own a car. But there are two scenarios that it still makes sense to rent.
1) What if you need a car in a different city? You just flew from JFK to SFO. You already have a car in NYC, but need one in SF. You're not going to buy a car in SF that you'll need to sell in a week. Sure, if you're going to be there longer, you might consider it, but then you're still carrying the costs of two cars.
2) Sometimes you need a truck. Maybe you have an IKEA run to make to get a bunch of desks, or stop at the hardware store for a few dozen bags of mulch, or ... But sometimes you just need a truck to get the job done. You could just buy a truck and be done with it. But trucks can be more expensive than a compact car, and they definitely have higher fuel costs. In this case, you'd probably be better off with a fuel efficient (or electric) compact car and rent a truck only when you need it.
This is how you save money with the cloud. But you definitely don't save money when you effectively rent a truck to drive to work everyday (even if you are in construction). There is a cost to renting -- it is more expensive on a per-use basis than it is if you buy. Cloud servers are more expensive than bare metal -- if you're constantly using them. It is only cheaper when you stop paying for the parts you don't need. And that also takes expertise.
Once, at a new job, I inherited a cloud server. It was costing us a ton of money per month and running 24/7 because the person who set it up never turned it off. After 3 months of those costs, they could have bought a new server with no other renting. They paid for a cloud server for three reasons: 1) they had no experience with hardware, 2) it was a pain to setup local hosting, and 3) it was faster to get running without waiting for a vendor to build a server, deliver it to the datacenter, etc... These were real impediments to the first person and the cloud server helped to get them moving. They just didn't have the longer term view of what their decision was going to cost in the long term.
The first thing I did was order a new server and make friends with our datacenter ops people. And now the only thing we really use the cloud for is archival (write-once, read-never) storage. If we ever really need these data, it will be super expensive. But, if that ends up happening, we'd be happy to pay the cloud tax.
> This is how you save money with the cloud. But you definitely don't save money when you effectively rent a truck to drive to work everyday
Isn't that exactly how companies use the cloud? Sure, there are contrived examples where the cloud is cheaper than self hosting. But the common case is that companies "use the cloud" by putting 100% of their infrastructure and hosted products in the cloud. That's what is meant when you say "X uses the cloud".
I think you're taking his analogy further than he intended then are arguing against your version of his analogy.
Cloud was made for people who don't have the time, talent or desire to build and manage it in-house. You pay a premium for that convenience and that premium scales with your business growth via IT resource needs. I think that's what he was getting at in his analogy.
sure - but then also think about the use case - he is using a storage account, which means that inclusive in the cost is
a - replication (within region / AZ at least) b - 0 software to maintain (no need to frantically patch apache / SSL / whatever) c - super quick set up / management / logs / etc
So, yes, bare metal is (on a cpu cycle to cpu cycle / GB RAM/HDD/Bandwidth) level cheaper, but TCO can be waaaaaayyy higher.
Yes, TCO can be higher, depending where you are on the curve of capex, amortization, and staffing costs. Don't forget you still need at least Developers, DevOps, and Security. If you're inefficient at cloud, spinning up ec2s left and right, using a lot of egress, storing a lot of hot/live data, your total cost is much higher, and will easily be more than the salary of that one sysadmin, or team of systems engineers, you would pay to maintain the colo space.
You have to do a lot of things right to get that Cloud Value, as the author of this blog post has shown. You have to do a lot of things right to get value out of on-prem bare metal as well, but those things are generally well-known, standardized, have less moving parts, and people with decades of experience and knowledge of best practices. The opposite of the current cloud landscape.
TCO is not a straight line.
It's the "weaving" part that has non-specific cost. If you have skill at weaving together pieces of the cloud in an optimal way, you can save money. Just like if you have skill in putting together your own infrastructure you can save money. I can see spending money on services, but I don't understand why people invest brain capacity on vendor-specific solutions.
How much extra time & development effort does it take to weave the services the right way to realize those savings, as opposed to doing it the simpler, old-school way on bare-metal?
In my bare-metal-hosted projects I can afford to do a lot of things that would be a major no-no in the cloud because I have so much hardware resources I can save development time doing things inefficiently and still come out ahead in terms of costs.
This is such a bad and US-centric example: for anyone who lives in a place where it is easy to get around without a car, hiring a car only when you need it is a no-brainer financially and owning a car is a totally waste of money.
No, there are economies of scale. For $5/month I can get. dedicated IP address in the cloud. For me to get one myself I’d probably have to buy real estate somewhere just for starters.
> The simple answer here is that Troy was using Cloudflare to offset costs he knew he would incur with Azure.
I haven't checked, but are the prices for Azure CDN relatively competitive with Cloudflare? I think you'd probably get similar savings going that route, and it would all be Azure.
> Sounds like a pretty expensive privilege.
I'd be suprised if his Microsoft Regional Director and MVP status isn't worth much more than 4 figures to him.
Those seeking to initiate engagements with Troy might care more about the fact that he pops up on HN and other high profile tech outlets frequently and the visibility of Have I Been Pwned, but the Regional Director status probably helps a lot with getting some of these engagements signed off.
He probably also receives significant subsidies from Microsoft as well.
Cloudflare is a partner of Azure ( or vice versa).
= Azure has an integration to use cloudflare for the cdn.
Cloudflare Bandwidth Alliance.
Azure is integrated with Cloudflare, if you chose to do so.
They also offer Azure CDN, as a competing product. But I don't know if anybody takes it serious or not
Azure CDN is fronted by Akamai; lots of retailers use it, but it's not as big as cloudfront afaik
There's also Verizon and Microsoft's own CDN. Personally, I use Verizon because Akamai doesn't support wildcard purge.
>Sounds like a pretty expensive privilege.
Well, it might also come with contacts in the billing department.
Apparantly you also get 13K a year in credits (mentioned in the article):
> I'm going to miss the $13,000 USD (yes) a year in free azure credits. Just remember this amount of money when you are reading content about "how good azure is" and "what the latest and greatest is" from influencers and community leaders here on social media...
The company I know got $200K in credits as their sweet initial deal. They were fully intent to stay inside that limit or close to it.
Next thing I see them being slapped with $700K bill and managers running like headless chickens all over development floor and yelling to turn off every VM, hard drive, database / whatever either resources.
There are plenty of people using both, it's a good combination.
The point is about Cloudflare not being from Microsoft / Azure which is the company he's evangelising for.
But you can't use CloudFlare instead of Azure, so it's within scope for him.
And moreover, from what it sounds like, you can't use Azure without Cloudflare either (unless you have a lot of cash you want to burn). Microsoft will get a lot more business from someone advertising their "Azure with Cloudflare" setup than they will from someone advertising their "Azure without Cloudflare" setup.
(Edit: fix spelling)
Presumably MSFT don't have a competing product. It's like asking why he's hosting it on Intel processors.
I don't think Azure offers a Cloudflare alternative, and I'm not sure they ever would - Cloudflare is too good and too cheap to compete against.
It's not that good. I'm constantly getting "Access Blocked" to various websites by Cloudflare trying to protect them....from me reading them.
From the operators point of view, Cloudflare is cheap and often used when you don't understand that "premium bandwidth" or whatever they call it these days, are just bandwidth you pay way too much for.
From the users point of view, Cloudflare will frequently stop you from accessing things and introduces more single points of failure in the internet infrastructure. But on the good side, they have pretty good edge endpoints so your browsing might be a bit faster, when they allow you to browse.
Same here. I live in Turkey and Cloudflare just block many websites' access from here. I can't access some sites just because I'm from an IP range from Turkey. I can jump onto my VPN, but still, not convenient.
Some websites configure their own rules in CF to block traffic from certain countries.
I don't get the "Access Blocked" often, what I do get is the "We are verifying your browser" page that often just keeps looping there and completely block my access.
I never do. Maybe you or something on your computer or network is the problem?
Usually it's because they live in a different country than what people consider the "Western World" or something like that. When I'm in Europe, I don't see that page very often, unless in Eastern Europe. But if I'm in Asia, Africa or Central/South America, then I encounter that page all the time.
Fwiw, this is an option that many people configure for themselves in cloudflare. Some security people love recommending using IP geo-blocking as a good tactic for hardening systems.
Nope. Just being in some country (in my case Turkey) is perfectly enought to be geoblocked.
If you configured your client to hide enough information, Cloudflare tends to believe you are a threat. VPN users probably have that problem a lot more.
You don't have to hide any information. I don't hide any information and still get blocked because of the country I reside in.
Yeah, but even if you’re in an “acceptable” country you get blocked if you care about privacy. The point is that CF often blocks people.
And so Cloudflare becomes an enemy of privacy?
exactly. I care about privacy, and I do see that page often. Sometimes it helps to send a non-empty useragent string, or to enable javascript. Most of the time I just close the tab.
Grooming influential people to promote your corp and then bullying them when they didn't turn out to be just parroting your marketing slogans. Classic corporations.
Huh. As an MVP myself (of DRM lol) I have to agree that was a poor astroturfing idea of Microsoft's. Although one employee != Microsoft. In all my MVP years Microsoft has never asked me to do anything like that. They've sent me to cool parties and events, but never asked for me to do anything as a result.
Lol I KNEW IT! An independent consultant blogging about awesome things in Azure? #doubt
Seriously, yeah, if he's an MVP, he'll be fine.
Just did the calc and 600GB/day is about 55Mbit/s. That's really not a lot and if there's not too much computation server-side you could serve this from a raspberry pi at home (provided you have good uplink). But that's assuming you keep the CloudFlare cache of course, or as author mentioned himself, advertising only torrents for the multi-gig files.
I really don't understand the cloud craze. Everything is more complex to debug, more expensive, and more shitty in all the possible ways you can imagine. I mean i was not exactly a fan of the VPS craze 10-15 years ago, but at least it wouldn't automatically ruin your bank account whenever you got a little traffic.
Kudos to the author for having so much money (thousands in one month?!) to waste. I wish i did too :)
> Everything is more complex to debug, more expensive, and more shitty in all the possible ways you can imagine.
Coming from traditional infrastructure and development methods, you're mostly right. Part of the expectation of the cloud is that you do things _their way_. And even then each cloud provider does things a little differently. However, if you're willing to subscribe to the <insert provider> way of doing things it (and you'll have to trust me here) makes many things easier. Here's a short list:
* networking setup is free/cheap/doesn't require a Cisco cert. you can trust a developer to set things up.
* object storage is so much easier than any file hosting scheme you can come up with
* the path from container-on-a-host to container-in-a-cluster to container-in-{serverless,k8s} is extremely straightforward
* I turn all my dev/test servers off at night and they don't cost me a thing
* consumption based compute will result in a much cheaper solution than a VPS or colo (admittedly there are many assumptions baked into this)
* some core services (like sqs, sns on Amazon) are extremely cheap and have provably reduced development time because you're not having to build these abstractions yourself.
This all being said I'm not advocating an all-in approach without thinking it through, but to do so where it's easy and makes sense.
EDIT: clarity
> networking setup is free/cheap/doesn't require a Cisco cert. you can trust a developer to set things up.
Bare metal hosts set up the network for you. You may need to know how to configure a local network interface. Even if you actually rack and stack many colos will give you a drop with network set up. You don't need to do what you describe unless you are building your own DC.
> object storage is so much easier than any file hosting scheme you can come up with
That matters if your data volume is truly massive. Only a small percentage have this problem. Also AWS inbound is free so you could upload big data to AWS and warehouse it there if you wanted. Not using big cloud for everything doesn't mean you can't use it for anything.
> the path from container-on-a-host to container-in-a-cluster to container-in-{serverless,k8s} is extremely straightforward
This is the one spot where admittedly you will have to spend more in administration. You'll need to either run your own k8s or Nomad or adopt a different configuration, and you may have to think about it a bit more.
> I turn all my dev/test servers off at night and they don't cost me a thing
You could still do this. Just host live somewhere else. You could also test on a local VM, which is what we do. Obviously that depends on how big your app is.
> consumption based compute will result in a much cheaper solution than a VPS or colo (admittedly there are many assumptions baked into this)
You only see the savings if they are passed onto you. What we've seen is that Moore's Law savings have not been passed on by cloud providers. Look at what you can get at a bare metal host compared to how much the same compute costs in cloud. Years ago the difference would not have been so large.
Bandwidth costs in cloud are insane, and most use asymmetric pricing where inbound bandwidth is free. This is known as "roach motel pricing" for a reason. Data goes in, but it doesn't come out.
> some core services (like sqs, sns on Amazon) are extremely cheap and have provably reduced development time because you're not having to build these abstractions yourself.
Fair, but they make their money back elsewhere. Those are lures to get you locked in so you now have to pay their crazy compute and bandwidth egress charges.
Here's an example. There are more.
Haven't even heard of DataPacket, thanks for the link!
And yeah I agree about the "some services are super low cost so you get hooked" thing. Always been my impression of Amazon: they look for what they can apply scale savings on (usually object storage, it seems) and make it cheap and then over-charge for almost everything else.
That's not really it.
The funny business in Amazon's pricing is their Egress Bandwidth, everything is rational.
You're looking at the pricing from a 'cost plus' perspective which is not generally how things are priced.
AWS core use case is IT departments being able to offload all of their infra.
It's a massive, massive advantage. It's so, so much easier and more flexible to use AWS that there is no comparison. It's a 'no brainer' from a cost perspective, which is why, cost usually isn't a barrier with AWS.
Cost only becomes a primary issue when the margin of AWS services is reflected in the cost of the product itself, i.e. when you are hosting a lot of content.
So if you are Phizer, and your IT department uses AWS, the cost is irrelevant.
If you are Dropbox, selling storage for $X/Gigabyte, and your competitors are reducing their prices and you're giving all of your margin to AWS, then you have to do something, i.e. 'make your own infra'.
I mean OK but I've been in big corps and they end up hiring a ton of DevOps that basically specialize in AWS.
Is that still cheaper? When you have 30+ very well-paid dedicated DevOps specialists? Maybe it is, I am just skeptical while looking at it as an outsider and without solid data.
AWS is the new Oracle.
Datapacket shows Discord on their customer list, I didn't know discord used VPS / Bare Metal or is it like they just tried it once and Datapacket struck their name to their landing page ?
Have you used DataPacket? If so, how's their uptime? Do they have any sort of automated failover so your service doesn't go down if something happens to a single box or rack?
ZeroTier has stuff at DataPacket. It never goes down. We've had 1 year plus uptimes and the network is rock solid.
> I turn all my dev/test servers off at night and they don't cost me a thing
I bet it still costs you more than my Hetzner one despite me not having to care about turning it on or off. I mean it's great that the cloud gives you this flexibility but you wouldn't need it to begin with if it wasn't so expensive.
When you are growing, it’s a no brainer. When you are at steady state it depends.
As a case in point, I worked in standing up a critical system in a large enterprise a few years ago. We spent about $12M on compute, storage, networking, etc. At operational state, it was about 40% cheaper than AWS. The problem is, it all sat there for 6-18 months filling up before we fully hit that state.
With a cloud provider, you pay a high unit cost but if you engineer intelligently your costs should move with utilization. Except for government, most entities generally want to see opex move with revenue and prefer to minimize capex where possible.
You're an order of magnitude larger than what I work on, but on our last big project we purchased and installed half in the first year, then the remaining half 18 months later.
Keep in mind that size tends to lower intelligence! ;)
The cloud is great for scaling. The lead time for new servers deployed in a data center is weeks compared to seconds in the cloud. Plus there's no sunk cost in the cloud - you can turn it off when done and it evaporates.
Also, the cloud offers managed software as a service. You don't have to manage your own HA DB cluster or PubSub. It's all just there and it works. That can save you a lot on technical labor costs.
But yes, I do agree with your point. If you don't know what you're doing, you can nuke your budget super quick.
The cloud is great for scaling indeed, but a cheap Intel V4 server with 44 cores from Ebay for $2000 can handle a shit ton of traffic too.
If I were building a new business, I would use both cloud and colo. But I do understand that everyone don't have that luxury.
The technical ability to scale is a bit meaningless if you can't afford it.
Then that becomes part of your business + technology planning conversations:
"This is the cost of scaling, this is the cost of owning our own infra, how does that fit into our budgeting and requirements?"
"If you can't afford it" is doing a lot of assuming and a lot of heavy lifting in that statement. Whether or not you can afford it depends strongly on your scaling bounds (how much you need to scale) and how you've chosen to implement it.
There are plenty of tools and systems that can present a sufficiently linear cost relationship to load and usage that, should your COGS versus revenue make sense, the marginal cost of increased cloud resources a no-brainer--especially versus always-paid-for hardware. If you don't have such a linear relationship you're as much in the position of deciding whether the project is viable as you are anything else.
If you have a large environment to build in a certain region, the cloud lead time is months also. We have to give our cloud provider months notice before building in a region. But we have a pretty serious and profitable workload. Your statement is correct for the 90% of companies with relatively small infrastructure needs.
> The lead time for new servers deployed in a data center is weeks compared to seconds in the cloud.
The lead time for Hetzner or OVH is measured in minutes and is appropriate for the majority of use cases. Old-school providers like these used to run the entire internet before the cloud craze started.
Colocation & own datacenter is not a good choice for most startups. Seems like a lot of people here miss a step and only go from one extreme to the other. There's a middle-ground of renting bare-metal that gives you amazing price/performance with none of the drawbacks of colocation or running your own DC.
> If you don't know what you're doing, you can nuke your budget super quick.
And even if you do, which I think you’ll agree Troy Hunt does.
"I really don't understand the cloud craze"
The opposite, I don't understand why anyone would ever put up a server if they didn't have to.
It's not 'processing power' that's going to be the 'big cost' for most projects.
It's headcount and salary.
If you can materially improve the operating ability of your company, then a few $K in cloud fees is dirt cheap.
I used to work at a 'tech company' that made a physical product and our IT was abysmal. We had to wait weeks for our sysadmins to order blades, get things set up, there were outages etc..
If a project is definitely going to be 'a few linux servers and never more' - even then it would be cheaper and more reasonable to use virtual instances.
The time to 'roll your own' is when the infra. operating costs are a material part of your business.
For example, 'Dropbox' invariably had to roll their own infra, that was inevitable.
Similarly others.
That said - as this article indicates, it's easy to 'over do it' and end up in ridiculous amounts of complexity.
The Amazon IAM security model has always been bizarre and confusing, and the number of AWS services is mind-boggling.
But the core case of EC2+S3 +Networking, and then maybe a couple of other enhanced services for special case works fine.
I also object to what I think is a vast overuse of Cloudflare, I just don't believe that in most scenarios needing to have content at the edge really changes the experience that much.
> It's headcount and salary.
This only really applies to fully-managed services such as Heroku. Every other cloud still needs a DevOps person according to my experience in many companies.
Yes, but one cloud devops can do what 10x what sysadmins, hardware and network engineers can do.
Just security alone, in terms of managing access to all of those resources, various forms of backup, across regions ... it's just out of reach for most organizations.
> 600GB/day is about 55Mbit/s
In what universe? This frictionless perfect vacuum where traffic comes in a wholly predictable consistent continuum?
Good point, it's just an average. And to be fair i checked the numbers in the article it seemed closer to 3.2TB/day which is closer to 300Mbit/s. But what i meant is a home fiber connection can deal with that. Although consumer ISPs don't have good bandwidth over all routes (it's good to Youtube/Amazon but ridiculously slow to some other consumer ISPs). If you don't want to serve from home, i'm sure many entities would be happy to donate disk space and bandwidth to help a project like this, setting up a mirror list like we have for distro repositories.
Also, we may be taking the problem the wrong way around: do these multigig files need to be accessed by everyone from a web browser? No, it's a dump file used by specific people in specific circumstances. Then why are we using HTTP for this in the first place? In this case, only publishing over Bittorrent/IPFS makes sense and many people will happily seed, pushing costs toward 0 for the publisher (and very close to 0 if you only push to a first circle of mirrors who can then push to more mirrors, some of which can be declared webseeds in your torrent).
You're not the target audience.
Startups growing fast are the secondary audience.
The primary audience is large enterprises where their internal IT costs <<more>> than the cloud costs. Plus internal IT provides those resources after 6 months...
Most people that use cloud computing aren’t stuck with the bills the companies they work for are.
As to difficulty, they “solve” organizational problems by avoiding sticker shock when someone wants 100+k in equip that’s often a huge number of hoops to jump through and possibly months of delays, a giant bill every month and nobody a complains about the electric bill etc.
> Most people that use cloud computing aren’t stuck with the bills the companies they work for are.
you can rest assured that even the largest company will come looking for the person responsible for increasing an expense by that large of a percentage. So maybe it doesn't come out of your personal checking account but you will certainly pay for it.
That’s assuming it’s a large increase in the bill for no reason.
It’s easy to justify having a larger bill with more traffic. “A retail store isn’t going to complain that they need to buy more stock after selling more stuff that’s just a cost of doing business.” Meanwhile it can be hard to justify a capital expenditure just because traffic increased.
> 600GB/day is about 55Mbit/s. not really it was minimal traffic then sudden bursts of gigabytes. Of course throttling the big spikes would actually have been a good idea in hindsight to give an early warning.
> but at least it wouldn't automatically ruin your bank account whenever you got a little traffic.
This only happens when consumers fail to set budget alerts. Troy could have saved himself $10k with 15min worth of work.
I think it is an irresponsible fad that people use cloud services for hobby projects (and despite its wide popularity I'm calling HIBP a hobby project since he's running it on the side for free) unless they have solid cloud ops experience from their day job.
Cloud providers love it when people do this and are famously easy to talk to when you get an unexpected invoice high enough to require remortgaging your house to even begin addressing it, but I think unless you're working on a side hustle that inherently will need to run in the cloud regardless of scale or are experimenting with cloud technologies in an explicitly time boxed toy project, using cloud services is the financial equivalent of handing a hobbyist craftsperson one of these chainsaw angle grinder attachments that even professionals find hard to keep from bouncing into your body.
If you do want to use cloud services for anything you pay out of your own pocket, the first consideration should be cost management and monitoring. Your employer might have big enough pockets to shrug off a runaway compute instance you forgot about for a month, but that can quickly translate into money that can be anything from inconvenient to life altering if it comes out of your personal budget.
Or just stick with the free tier and make sure everything simply shuts down if you run out. Sure, a "bandwidth exceeded" error page might not get you as many upvotes on HN, Reddit or social media, but it also won't impair your finances.
I don't know what the alternative is. Run a home server and pay an ISP $$$ for unusually high upload bandwidth/throughput? 99/100 times running it in the cloud is going to be cheaper, easier, and more resilient.
Of course, the delayed sticker shock is a problem.. I think Google cloud actually lets you create a budget that turns services off if they go over, so there's a solution here if you run a hobby project that you suspect might take off and cost you more than it's worth.
I've variously paid $5-$30 a month for a VPS/dedicated server to host all my random side projects over almost two decades, including websites for other people, email, etc; there's probably two dozen or more sites running on my Hetzner dedicated server, with storage and CPU and RAM to more than spare. And not once had to worry about extra fees or weird billing issues. Bandwidth has grown from 100Mbps to 1Gbps and I've never had traffic issues.
And this is a cloud service is it not?
A dedicated server isn't usually considered a "cloud" service. It's a physical server allocated to you, with unmetered bandwidth and local disk.
Where do you get bare metal servers for that cheap? I assumed by VPS you were talking about a VM
The VPS was a VM, but I moved from that to dedicated a long time ago.
Hetzner has nice dedicated servers for €33/mo:
I'm on an older one that just got bumped up to €29/mo due to increasing electricity prices; it was €21/mo until now, and I can't blame them for that one. The specs are E3-1245 V2 / 16GB / 2x3T, there's over 45 vhosts on it across ~25 wwwroots plus other random services, and CPU usage is basically nothing. The cores are really there just to handle bursty stuff. Most random side projects and small websites don't need almost any resources on modern hardware.
Previously I was on a Scaleway Dedibox, which go as low as €15/mo right now. It was €10 at one point even.
> and pay an ISP $$$ for unusually high upload bandwidth/throughput?
But the ISPs I know do not bill $$$ if you use the max bandwidth (max bandwith they did advertize to you btw) for a sustained amount of time: they'll just start throttling you.
Anyway GP ain't asking about "cloud vs hosting at home" but about "cloud vs dedicated server(s)".
My cloud costs for my micro instance are about $12 a month. Multiple domains on there. I don't use RDS, ElasticCache, not even load balancers. If you want to keep the costs reasonable, you must roll that stuff on your own, which is totally possible (and free), and in fact kind of fun as a learning experience.
Because it's not cool, and won't make your CV sparkle.
I'm sure there becomes a point where cost of (hardware + maintenance + staffing) > (cloud + staffing), in which case sure crack on. But like you, I'll stick to a rented server for my stuff.
The direction is opposite IMO. As you grow bigger on prem starts making a lot more sense.
There is a size where on prem would be much cheaper on paper, but internal red tape for access to internal resources is such that teams are unnecessarily slowed down. For example I once worked at a place where it took several months to get an additional on-prem box to speed up our CI pipeline. Of course you can also add that amount of red tape to a cloud solution, but in my experience it's easier to get approval for an additional EC2 box.
We used to have an internal (and external) cloudfoundry instance. That was pretty nice as far as on-prem deployment options.
It’s just a shame they were permanently out of database servers with SSD storage, and for some reason couldn’t provision more for over a year.
Yes, this is the awkward phase of on-prem. Some companies stay there forever. Good companies will continue innovating and treat the time to resolution of your request as a KPI to reduce down to days, minutes, or even seconds.
On-prem and AWS/GCS/Azure aren't the only options.
There are smaller cloud providers, rented VMs, rented dedicated servers and rented colocation space.
For larger companies, frequently they are. You need to use already approved vendors.
I think the case is for big companies that have a hard time attracting IT talent. Like, not in an even remotely IT-related industry, and their headquarters is in a city with no significant tech community. Places that scream "working here will not improve your resume".
I'm a major cloud skeptic, but there's a certain class of giant enterprisey companies that are never going to be able to attract good IT talent, and if they "just throw money" at the hiring problem they'll be innundated with slick imposters.
I think cloudy stuff lets those companies outsource a large chunk of something they'll never be good at. The cavalcade of Microsoft/Cisco certifications were an earlier decade's attempt at solving the same problem.
I believe renting dedicated servers is often overlooked. You pay someone else to install hardware, ensure network connections and be on-site for hardware swaps etc but still have the maximum degree of flexibility.
Even larger companies can work well with that model, traffic also tends to be cheap enough that you can spread across different vendors to avoid lock-in. And in that case, your sysadmins can sit wherever they want, no need to be physically close to the servers.
Also, as there's much less knowledge to be a dedicated server provider, competition is strong and prices are comparably low.
Many providers will now also bring up dedicated servers for you so fast, or offer you API based provisioning of single-tenant VMs or similar that it's really rare that the difference relative to cloud providers becomes much of an issue.
I used to spin up dedicated servers and then put an overlay network + a simple set of tools to spin up containers on them years before Kubernetes etc. was a thing, and we'd have a "global" (we had VMs in Asia, dedicated servers in Germany and colocated own servers in the UK) unified deployment mechanism that let me spin up containers wherever with a one-liner. Having a few extra dedicated servers with spare capacity standing by still made the whole system far cheaper than e.g. AWS, even if you attributed my entire salary towards it (I spent nowhere near all my time keeping that running).
It's easy enough to find consultants that can set up systems like this that abstracts away the dedicated hosting providers so you can mix and match and move with ease - especially today with options like Kubernetes.
If I was to go back to doing consulting I'd probably look at finding a way of packaging this kind of offering up behind lots of marketing speak and offer some sort of "abstract" hybrid private cloud layer on top of a choice of dedicated hosting providers to make that kind of hosting palatable to execs who refuse to believe the cost saving potential because they've never dived into the actual numbers (oh, the amount of time I've spent building out spreadsheets with precise cost models that'd get promptly ignored because someones had heard from a friend that company X swore vendor Y was cheap and believed it blindly)
I don't have first hand knowledge but my understanding is large companies have procurement departments and they get over half off sticker price for Azure. My guess is this is why the sticker price needs to be overinflated because people in procurement need to show that they are doing their jobs.
Also it is a major pain point getting anything done with IT operations.
Like the Oracle database server that half the department relies on stops responding on a Friday morning and it takes all day to determine the hard disk is full and fix it. I had never before worked at a company where this happened multiple times.
Or operations saying they were unable to restore a windows server hosting a database server and now everyone has to scramble to update their connection strings because operations somehow cannot use the same domain name for the new machine.
It's true there are huge rebates to be had if you're big enough, which is one thing smaller companies should bear in mind when they look at big company X using cloud provider Y as justification for thinking Y must be cheap.
If you're Netflix, cloud is probably not that much more expensive than owning your own servers. Maybe even cheaper. But you're not getting Netflix prices.
But even if you're small fry you should however start regularly talking to your provider and go through a regular cost-cutting exercise and talk to them about how you're looking at provider Z and have been asked to cost out managed servers and on prem options.... You won't need to get very big before that starts paying off.
If your competition is doing this and you're not, and hosting costs starts becoming a big part of your cost base, you won't be able to compete.
Long term I think we're going to see disruption here to the point of startups failing because of competitors copying their idea but being better at driving down hosting costs by not being afraid of going to dedicated hosting or hybrid solutions (hybrids are my favourite - if your stack can be deployed semi-transparently both on dedicated servers and cloud you can go much closer to the wire on your dedicated servers by being prepared to spin up cloud instances to take care of spikes; ironically having the ability to spin up cloud instances makes relying on cloud services even less cost-effective)
I'd also expect to see more "hybrid" cloud offerings with companies offering you operations-as-a-service by giving you a virtual cloud type interface where they don't actually own a cloud service themselves but helps you abstract away cheaper hosting providers. You can already find plenty of people who'll e.g. run Kubernetes setups for you, so taking the step to do more cost-optimization on the backend is natural (and I'm sure there are people who'd do this for you today - if I was still doing contracting I certainly would be offering that - and maybe someone is already wrapping it up as a service offering; I haven't kept up on that market)
> I think the case is for big companies that have a hard time attracting IT talent.
Companies that have made a name for themselves by outsourcing to the cheapest IT contractor that will promise them the moon and fill the seats with barely warm bodies? I was one of those bodies so I know exactly why they can't attract talent - they don't bother, and don't reward it. They treat IT as a cost center and are surprised when they get disrupted. The only good options in those companies are to work on the business side or worst case as a project/product/program manager interfacing with the warm fungible contractor bodies.
Many Enterprises are only alive because of inertia and goodwill from earlier decades.
No thanks. In our case we would then need to hire DBA etc. I prefer to have as many managed (in this case by AWS) services as possible.
Once you scale you'll need DBAs anyway to do things AWS won't do for you,or developers with the same skills, like figuring out why your developers are writing queries that kills the production database because they didn't test with data sufficiently similar etc.. I used to manage about 100 Postgres instances alongside ca ~1000 VMs total spread across colocated servers and managed hosting in several countries. The time I spent spent on the type of DBA tasks that e.g. AWS RDS automates away from you was measured in minutes per month after I'd spent a few days automating backups and log shipping and failover.
I kept being asked to price out a migration to AWS, and we kept coming up with 2x-3x the cost. Part of the reason was that we could pick and choose servers that fit our workload in a way we couldn't with AWS, and partly the absolutely insane bandwidth prices AWS offered.
I use AWS. I like AWS for the convenience. But it's a luxury that is ok when you're either small or really high margin, and you're paying massively over the odds for that luxury.
The reason these services get away with being so expensive is that people massively overestimate the complexity and don't bother actually getting quotes from people or companies to manage these services for them. When I was doing consulting my biggest challenge in offering up alternatives to AWS was that people were so convinced AWS was cheap that even when presenting them with hard data they often didn't believe it. For me it was a mixed bag - I tended to make more money off the clients who stayed on AWS as they usually needed more help to keep an AWS setup running than those I migrated to managed hosting setups, despite paying more for the hosting too.
Not really, on site makes sense for Facebook or Google. Or for extra privacy.
Mid-sized companies can get cracking deals (like 10% cost) on major cloud providers.
On prem rarely makes sense any more other than at that kind of huge scale or for privacy, sure. But that's because dedicated hosting operates on really thin margins and has become really cheap. You have to get to massive scale before cloud providers will give you big enough discount to start approaching the kind of costs you can get that way with a decently engineered system. Not least because cloud providers themselves provides a weapon: Set up your system so it can scale up using a cloud provider to handle traffic spikes and you can load those dedicated servers much more heavily than you could otherwise risk.
The biggest issue, though, is how few people are aware they can negotiate with their cloud provider. I've seen so many places just pay the sticker price without even trying to get discounts.
(Conversely, I once got a contract to do zero-downtime migrations first from AWS to Google Cloud and then to Hetzner so a startup could launch on AWS and spend the huge amount of free credits they'd been given there, then migrate to Google Cloud to do the same, and then finally move to Hetzner once they had to actually start paying; relative to what they'd have to start paying if they'd stayed on either AWS or Google after their credits ran out the cost of having me do the extra setup to handle that was covered with ~2-3 months of their savings)
I have a few dozens of personal projects on AWS using APIGW, Lambda, CloudFront, Dynamo DB and S3.
Their monthly cost is something between 0 and a few cents.
Stuff like Hertzner is fine, but if you know your way around AWS you realize have massive cost savings. Prob the same for Azure.
Finally, in many places 40 EUR for a pet project is actually a lot of money.
Probably would run just fine on a <= 4 euro/month virtual machine too. Of course it doesn't quite scale to zero like APIGW,lambda,etc. but on the other hand you can be fairly confident to not pay more if your pet project suddenly lands on the front page of HN.
Keep in mind that the "<= 4 euro/month virtual machine" has maybe 256MB of RAM available and running anything beyond nginx + a web server which needs to be cycled every few days due to memory fragmentation can become challenging. I've tried this many times, but it's just not worth the extra hassle. And I want a vpn, monitoring and database even on the toy project server as a minimum in reality.
Maybe 5+ years ago. These days ~4€/month gets you 2GB of RAM, SSD storage and plenty of bandwidth.
Hetzner has such a VPS offering (2GB RAM/20GB nvme SSD/20TB bandwidth), netcup has one for ~3€, contabo has 8GB/50GB nvme/32TB for 6€/month and there are plenty of smaller companies around the world offering similar deals (usually somewhat less included bandwidth outside europe though).
It does look like I'm behind on pricing changes. Sounds like it's time to move away from vultr.
> Keep in mind that the "<= 4 euro/month virtual machine" has maybe 256MB of RAM available
For ~4€/month (depending on your country), Hetzner offers "Hetzner Cloud" servers with 2GB of RAM, see https://www.hetzner.com/cloud?country=us
A stardust instance on scaleway comes to less than 2 EUR per month and it has 1G of RAM and runs a toy project or even a small personal infra just fine :)
EDIT: Personally I pay 9-10 EUR per month to Scaleway for a 2G RAM and 2 CPU VPS, private docker repo and S3-compatible storage which holds data and some backups, which run both my personal services and some toy projects when needed. I am not affiliated with them in any way
Contabo has €5 VPS with 4 cores, 8 GB RAM and 200GB SSD. The one I have runs multiple Valhaim servers that are constanly hammering the CPU, some .NET webs etc. and it's fine.
Make up your mind: https://www.netcup.eu/vserver/vps.php
No it wouldn't, because I infrequently need burst (bigger lambdas).
As for costs, setting up billing and usage alarms on AWS is absolutely trivial.
Finally, using stuff like S3 or dynamo for storage gives me a peace of mind I will never have when managing my own servers.
Out of curiosity, what are the lambdas doing that requires more than 2-4 GB of RAM?
> Finally, in many places 40 EUR for a pet project is actually a lot of money.
Doesn't change the equation, unless you set up all your PAYG cloud infrastructure and never use it.
That dedicated server you have to manage (ensure security, install the software you need, keep it updated and secure etc). It’s not for everyone.
Also, as you can see in a screenshot on TFA: Some services are simply dirt cheap. The storage account and its various “sub-services” is such a thing. It’s hard to compete with dedicated hardware here.
Depending on your dedicated hosting provider, the traffic cost trap exists, too. Hetzner is a bit of a special case.
> ensure security, install the software you need, keep it updated and secure etc
These things are now trivial enough that it doesn't make sense to pay 10x the cost of bare metal for a cloud provider to solve them for you unless you have a crazy amount of runway or absolutely no idea what you're doing.
Or unless your traffic is so low that the marginal cost differences are something you can swallow.
I've been running something on AppEngine for 10 years and it costs me less than $1 a month. Not sure I could find a cheaper VPS.
On the other hand, I also manage a Mediawiki install, and a cheap Hetzner VPS works great for this.
> That dedicated server you have to manage (ensure security, install the software you need, keep it updated and secure etc). It’s not for everyone.
apt install unattended-upgrades. And Hetzner's firewall.
And cloudflare tunnel which allows you to block even ports 80 and 443. The only attack vector is then through ssh but with passwords disabled I wouldn't worry too much about that.
By the way, unattended-upgrades is enabled by default nowadays.
Most cloud users will have a VM somewhere which you also have to manage.
Not at all. GCP, AWS, and Digital Ocean all have PaaS/serverless systems that eliminate the concept of VM (from your perspective). I haven't managed a production VM in many years.
Even Elastic Beanstalk, which is just EC2 instances have a checkbox for automatic 0 down time updates.
Debian and derivatives have unattended-upgrades too.
> That dedicated server you have to manage (ensure security, install the software you need, keep it updated and secure etc). It’s not for everyone.
Hetzner also offers managed servers where all this is taken care of, for relatively fair prices.
Basically the $40 server becomes $80 when managed by them.
You can use their cloud offering $4/month, 20TB traffic.
>"That dedicated server you have to manage (ensure security, install the software you need, keep it updated and secure etc). It’s not for everyone."
Typical FUD. On modern servers and the type of software it occupies very little time. You'd spend more managing your cloud architecture.
Arguably Hetzner is a cloud operator too. I guess it's a spectrum...
I wonder if the disk on a $40 Hetzner server would be fast/big enough for him. All the searching and storing of massive password hash collections.
He has a writeup here on how he gets costs down in a big way: https://www.troyhunt.com/serverless-to-the-max-doing-big-thi...
I tried to scan through the linked article (and OP) but couldn't quite figure out Troy's storage requirements. Are they really massive?
The sum of the GB figured shown in the OP doesn't even amount to 200GB AFAICT. But even if it's something like 10TB that's still not super expensive on many hosting providers.
The post wasn't relating to data but more this quote:
> It's costing me 2.6c per day to support 141M monthly queries of 517M records.
Also, you might be able to store 1TB of data on a spinning disk with no problem but can you run the amount of queries he needs? Will you be able to run them as fast as you need? How much RAM would you need? etc.
The math says it was 25 TB per day for a month.
($350 per day at .014 per GB)
Ah, you mean bandwidth. I meant how much actual storage at rest (HDD size).
It depends somewhat on the organizational skillset you have, in my opinion.
Current workplace is considering a fully self-hosted stack as a unique selling point for the customers and segments we're in. That means, we have storage and linux admins available, as well as tooling and know-how how to run this securely and efficiently. Thus, placing large and often downloaded files on our file stores at hetzner is very much a no-brainer, because it adds very little workload to the teams maintaining these stores and it's cheap.
However, this can be a daunting thing if you don't have this skillset in the org. It can be learned, but that's time spent not working on the product (and it's not trivial to learn good administrative practices from the hell that google results can be). At such a point, a cloud service just costs you less man-hours. And again - it wouldn't be much time for me, but it would be a lot of time if you had to figure all of that out on the fly. That's essentially why the saying goes that cloud services save you time, but cost money.
Where is a good place to learn good administrative practices?
I found the RedHat Security and Hardening Guides useful for this.
> I just don't understand why people use cloud services.
1, when they need to adjust rapidly between different resource usage profiles, e.g. because they are growing rapidly and can't predict what the usage will be X days in advance
2. They have huge resource requirements and don't care to invest in their own infrastructure, but can negotiate lower rates with a cloud provider
3. When their resource usage is modest but profitability is high enough that cloud expenditure is a rounding error
> 1, when they need to adjust rapidly between different resource usage profiles, e.g. because they are growing rapidly and can't predict what the usage will be X days in advance
One can add new servers in minutes, removing has a bit more latency to it, but I'd figure with the huge price difference between rented and cloud you'll come out on top with the former in most case. Also, just use a clustering or orchestration layer in between, they range from very simple to setup and use (e.g., Proxmox VE), to quite complex but also very capable (OpenShift, kubernetes, ...).
> 2. They have huge resource requirements and don't care to invest in their own infrastructure, but can negotiate lower rates with a cloud provider
Using hetzner or other providers is not investing in their own infra, that's using (= renting) the providers infra and ability (peering, fast uplinks, datacenter perks like utility redundancy and staff on site). The second sentence may be true but probably not for most use cases that aren't huge yet, like the post here.
> 3. When their resource usage is modest but profitability is high enough that cloud expenditure is a rounding error
IFF, yes, and often infra costs are relatively low compared to salary costs, so that's definitively some optimization problem one should go through when deciding such things. Chances are that for most projects the profitability can be good but not magic money printing and infra costs are a non-negligible part that eats on their revenue, and then it's definitively worthwhile to think about avoiding the high premium most of those cloud offerings ask for.
> One can add new servers in minutes
With Vercel I don't ever think about adding servers at all, huge win.
> infra costs are relatively low compared to salary costs
Enterprise SaaS here, this is it. Any second my team spends not caring about infra is well worth it.
4. When their resource usage used to be modest, so they got on cloud services for increased developer convenience, and now can't afford the switching costs even though their bills are expensive.
Maybe one wants to mantain the application and not the server? Long time ago i booked a vps, install some bsd on it and thought i am good.
A month later a ntp security vulnerability was discovered, soon the server was put offline, some 'patch your things asap' not so nice emails came in. From that time my take is one should spend some time probably daily on an own server if one wants to mantain it.
Right, because a barebone docker hypervisor needs so much admining.
It runs NTP, does it not?
Aren't Azure Compute Nodes also "bare metal"?
Based on a quick Google search for "Azure Compute Node":
> A node is an Azure virtual machine (VM) or cloud service VM
> The terms node and VM are used interchangeably occasionally
> Azure Batch creates and manages a pool of compute nodes (virtual machines)
> In an Azure Batch workflow, a compute node (or node) is a virtual machine that processes a portion of your application's workload
So no, seems Azure Compute Nodes are VMs, not bare metal.
I dont know.
Well there’s a gap between the amount of convenience you get on the major clouds and one like Hetzner.
I’m a huge Hetzner fan, and their cloud offering is definitely growing but still isn’t as convenient and featureful as it could be (and they don’t share their roadmap currently so hard to tell what they’re working on next).
I’m trying to do something about it though, working on Nimbus Web Services[0]. In my mind all we need is something to bridge the managed services gap and make it very easy to set up the basic 3 tier app with some amount of scale/performance elasticity!
[0]: https://nimbusws.com
But he could've put static files on a Hetzner server and still have his backend in Azure. That would've solved these issues and probably saved even more money.
Being able to run a relatively simple global cache with a cheap provider like Hetzner has the origin is also harder than it should have to be.
How's that? Setting up a revers nginx proxy with cache takes probably less than an hour even if you've never done it (speaking from experience). And otherwise, if the files don't change that much just ssh in, copy them on the server and serve them via nginx and cloudflare tunnel?
I'm in no way a sysadmin and have set up these configurations manually in less than an hour for side projects. Cloudflare tunnel also allows you to lock down the server for everything but ssh with pubkey auth so the attack surface is really small.
Ah sorry I should have been clearer on this -- "global cache" === CDN. Hetzner does incur a performance latency (unless you use the brand new US DC of course, and your customer happens to be in the US). IIRC right now you can't mix US cloud servers and German ones in the same Load balancer (also a relatively new hetzner cloud feature) but of course you can do some DNS tricks and get the loads to be fast.
Actually hosting files is super easy (Caddy is awesome, NGINX is awesome), but it's even better when you don't have to set up the server at all, for example just turn on "HTTP access" on a object storage bucket for example. So this is another place Hetzner kind of falls short though they do have hosting options[0], so basically the ideal solution here would be to deploy a simple Hetzner app (caddy/nginx or the hosted options hetzner has), set up a cheap CDN (Bunny, Cloudflare, etc) in front of it, and save money that way. If the bill is still too high just take the penalty or bias towards one geo (germany/US).
I was less talking about the difficulty of getting a server up and more about the CDN bit of the issue to make loads blazing fast!
[0]: https://www.hetzner.com/webhosting what you want is latency reduction. Usually what sites like Vercel and others give you is way faster loading time by putting stu
Not related to your comment 100%, but after reading your comment I went researching curiously. Ended up questioning "hey didn't ISPs used to cache content?" Only to discover that they don't anymore, because of HTTPS/SSL, the gift that keeps on giving and effectively warping the web.
So that leads me to my question for HN. Have we completely abandoned non-HTTPS, particularly perhaps for the use-case of server-side caching of HTTP content? Also, isn't this a valid use-case to not use HTTPS and to re-enable that sort of functionality at the network/ISP level?
If I understand this particular case correctly, the large files are just big data downloads of several GB each.
Latency isn't particularly relevant for this, and it probably isn't relevant for most hobby projects.
Why not use CloudFlare in front? That's what was being used anyway, as per the article.
> so basically the ideal solution here would be to deploy a simple Hetzner app (caddy/nginx or the hosted options hetzner has), set up a cheap CDN (Bunny, Cloudflare, etc) in front of it
I agree! Cloudflare probably won't be this cheap forever but like I said I think that's the optimal solution, with the option to cut over and take the latency penalty if costs are out of control.
The usual answer you'll get is that it's not "infrastructure as code", is not highly available, etc... and while that's theoretically true, in practice modern hardware is reliable enough that I'll take the gamble (and the complexity of clouds and their control plane means that you may have more outages than what would be caused by hardware failures).
You can always set it up as such though. We're using k8s/terraform on hetzner cloud perfectly fine on like 30% of the AWS costs we had before that. Maintenance is minimal as well.
apt install nginxApologies wasn't clear -- what I meant was the difficulty of setting up NGINX AND setting up a CDN to serve your content as fast as possible from multiple places is harder than it should be. They're both relatively simple tasks in this day and age but they're not connected/brain-dead-easy for a vendor like Hetzner.
Honestly, they're not even connected/brain-dead-easy for a vendor like AWS particularly -- you still have to click around a lot or write a bit of terraform/aws-cdk/etc when all you really want to do is throw a folder or zip file over the wall and point it at a domain.
There are tools like Ness[0] out there which look like a breath of fresh air but there needs to be more tools like that where the supported backends include a cloud like Hetzner/Leaseweb/OVH.
I have a pretty complicated architecture that would cost me about 20-35$ if it was hosted just on Digitalocean or Hetzner. Instead its AWS ...soon to be multicloud, and costs me about 140$/mo (which does vary). But it does allow me to experiment, write long articles and design some fun stuff; about which I blog on my own website. The blog has gotten me both clients on freelance projects and enough "cred" to start on new projects I don't have any resume experience on. That's the only reason that I personally use cloud services (of course, the reasons for SaaS/Enterprise clients are usually more valid than mine).
What stops you from having a blog on Hetzner? That doesn't seem like it has anything to do with AWS whatsoever... or do they offer a blogging pltform?
I'm blogging about the experiments I am running with AWS hosted infrastructure. It could be hosted pretty much anywhere, a rpi would be enough. But I can't run those experiments on Hetzner, they simply don't offer as much options as AWS to run experiments.
Because the blog is about his experiments in AWS?
AWS is cool and all and has a bunch of interesting stuff, it's just expensive.
- Patching - Remediation, Monitoring, day0 response
- Security Information and Event Management - exports, alerts, OS configuration
- OS/Application Hardening - Encryption, Password/keys rotation, CIS/other baselines, Drift Management
- Backup - Encryption, (don't forget your passwords/keys are changing), retention, data protection compliance, monitoring, alerting, test days
- High Availability - replication, synchronisation, monitoring, alerts, test days
This is just the tip of the ice berg, if you operate in an environment where Insurance, Reputation, Regulatory Compliance, certification, etc.. are important, then it's easy to see why PAAS solutions are desirable.
Eh, if my bank goes down or gets compromised then I will hold it against them regardless of if they are self hosting or using the cloud.
Exactly, and using a SQL PAAS solution for example, it will always be patched without the bank having the expense of doing it themselves.
Because they provide managed services that VPS hosters don't have or which would require the overhead of maintaining and patching servers, and many people just want to get on with their lives instead of worrying about OS exploits...
That's why you take some kind of "managed hosting" where all of this is taken care of.
like AWS?
More like sth. starting at 35 €/month for 20 TB of traffic. Hetzner has something like that, shared managed webhosting is even cheaper.
But they do offer managed servers.
If you only need a server, as in CPU, RAM, disk and bandwidth, with a more or less constant demand, then sure, a dedicated server is way cheaper than any cloud. You want to use cloud for the ecosystem of other services besides VM/instances, and especially to use them in an automated way. The other use case is elastic demand.
IIRC, hertzner "unlimited" traffic isn't quite unlimited. You have a few monthly TB depending on what you contracted, if you go over it there's massive speed reductions until you pay a fee.
I do rent from Hetzner and OVH. Before signing contract I emailed them and asked if there are ANY limits / throttling beyond their unlimited 1gbs. They assured me in writing (email) that there are none. Some of my rented servers host giant 4K high video files and transferring those which happens all the time keeps that bandwidth pretty occupied. So far I did not see them impose any throttling. Not on my business anyways.
In this case, that arguably would have been preferable.
A lot of cloud cost objections would be solved if they defaulted to that instead of defaulted to just charging you the fees. That has its own tradeoffs, of course, but I find myself suspicious that the reason the clouds work this way isn't so much a cold and sober consideration of the aforementioned tradeoffs so much as "this way makes more money when we charge people lots of money they weren't expecting" and "this way makes lots of money when the people deploying the service are organizationally and fiscally disconnected from the people paying for it so they care and notice less".
It's truly unlimited now. I know someone who's pushing 1Gbps constantly (selling Plex access) and Hetzner have no issues with it.
>Can somebody explain to me why I wouldn't just rent a 40 EUR dedicated server from Hetzner [...] , I just don't understand why people use cloud services.
This recurring question of "why AWS/Azure instead of Hetzner/OVH ?" keeps happening because people are incorrectly comparing higher-level PaaS to lower-level IaaS without realizing it.
PaaS != IaaS are not equivalent. IaaS is not a direct drop-in replacement for PaaS to save money if the workload is using PaaS features that IaaS does not include.
The author Troy Hunt is using the higher-level Azure services like Table Storage (like AWS DynamoDB/SimpleDB) and Azure Functions (like AWS Lambda), and others. E.g. One of the article's hyperlinks talks about using Azure Functions.[1]
If he used Hetzner, he'd have to reinvent the Azure services stack with open-source projects (some of which are buggy and immature) and expend extra sysadmin/programming work for something that's not as integrated. The Azure/AWS stack includes many desirable housekeeping tools such as provisioning, monitoring, routing, etc which he'd also have to re-invent.
TLDR: People choose Azure/AWS because it has more features out of the box. You just have to figure out on a case-by-case basis if the PaaS value-add makes financial sense for your particular workload.
EDIT to downvoters: if Hetzner actually has built-in equivalents to AWS Lambda and DynamoDB, please reply with a correction because I don't want to spread misinformation.
[1] https://www.troyhunt.com/serverless-to-the-max-doing-big-thi...
Yeah, it feels like someone saying "why don't you build your house yourself? Would be much cheaper". This is certainly true, but
- My house is probably going to be build much faster if it's built by professional house builder (even more true for services since it's available immediately)
- I have better things to do than building houses
> people are incorrectly comparing higher-level PaaS to lower-level IaaS without realizing it.
Hum, no. People are asking what kind of value that platform adds that can justify all that risk.
And nobody is giving any clear answer, so I'll stand with my previous answer of "none".
You are not wrong. Hetzner would be a good choice instead.
> As far as I am concerned, I just don't understand why people use cloud services.
I use the credit card of my employer. For my own projects I use my own server for everything. Granted, it doesn't get much traffic.
Some offers from cloud providers are pretty good. If you want to scale to more (virtual) machines, it can be more easily done with the usual providers. I also expect Amazon to know more about firewall and reverse proxy configuration, it renews my certificates automatically and has rudimentary services for monitoring of server state. There is a certain convenience to it.
Would I recommend cloud based hosting? Absolutely not. You become dependent on the provider and prices are often steep. Even if you do not know much about server security, your unsecured s3 bucket will be far more exposed than your standard db installation on your own server. Better build expertise for systems you have full control over than to invest the time on the details of AWS which are more subjected to change.
> As far as I am concerned, I just don't understand why people use cloud services.
For companies the benefits are the abiltiy to get new servers at a click of a button and get rid of a server. For example, asking the ops team to setup a snapshot of a database for a few hours while I do something is super useful.
There is also the ability to use autoscale and other stuff to automagically scale your system to handle traffic peaks. With dedcicated servers you need to always have those resources available. It's attractive to managers that they're only paying for resources when they're using it.
There are also managed services like DynamoDb, Lambda, S3, etc that can make things easier and reduce your sysadmin work. And allow you to get up and running very quickly.
Obivously, a major downside is that the pricing is extremely vulnerable to spikes like this. I think we see an article like this every 3 months or so. This one is rather tame compared to some others that were 10x as much for a 24-hour period.
Hetzner dedicated server * 3 + k3s + vnet + longhorn + metallb = basically the cloud.
I can snapshot a database disk with a click of a button and restore the snapshot with yet another few clicks.
I have 1.5 TB of highly available disk space, 40 cores of full CPU power, 160 GB of RAM, & dynamically provisioned IPs for metallb. For only $130USD a month. For the same price in Azure, I had 6 CPU cores & 8 GB RAM.
You could do that.
But let's say need 4x vCPU: 72, Memory (GiB): 144 for 4 hours. Or you need that 12 hours a day but for the rest of the time you need 2 cpu and 4 GiB of memory.
You need to handle traffic spikes such as TV traffic.
Yes you could self-host a cloud env but you can't scale your resources the way you can with cloud.
I’ve never worked anywhere where that was really necessary. Even when I worked at Microsoft, the services my team built needed to be big scale, high perf, etc… But they would have easily run on a fixed number of beefy machines even at our peak load for a fraction of the cost of Azure.
It really was necessary due to some silly people not listening to me 2-years ago and ignoring technical debt until it got to the point the service they sold with a 1.5 million penalty fee for failing to deliver was needed to be delivered and load tested.
And that is literally the largest AWS Elasticsearch cluster option. So clearly that will be deployed for multiple orginsations. Otherwise they wouldn't have created a default node that size.
> But let's say need 4x vCPU: 72, Memory (GiB): 144 for 4 hours.
I would probably send you back to where ever you came from and tell you to re-engineer that. Cloud or no cloud.
And you would be told that it needs to go live because of a customer contract and failure to deliver would be a 1.5 million euro penalty fee.
Sometimes you need to spend lots of money on tech debt. I think it's nuts that was required but it was.
The German companies are really nuts. The cost to value ratio is through the roof. I’m a happy Netcup customer, and I honestly don’t know how they do it and make any profit at all.
I wish they’d bring those same prices to some US data centers.
I'm sure there are some big political flamewars to be had in this area regarding per capita productivity.
>As far as I am concerned, I just don't understand why people use cloud services.
Well that's the first issue. Many people have automated large parts of their infrastructure in this way so that distributing one huge file becomes part of that whole mess. The goal is of course to keep costs down to a minimum. You can actually do a lot with little money using cloud services.
But the careful balance is that you can easily miss little details. But how does that differ from any systems administration? The details are just in new areas that didn't exist 5-10 years ago.
And the details you miss are more likely to increase cost. And when you process a lot of traffic, you're popular, that can go real fast.
20 years ago in hosting we might get a porn stash on a hacked NT4 server that would draw bandwidth. And back then a whole company might have 100Mbit fiber so you'd notice.
For such a (relatively) simple architecture: I agree. Easy dedicated server, make a point to watch security updates.
The reason to use cloud-style services is so you can focus on building the product quickly instead of building and maintaining architecture. But once the product is stable, a cost-reduction pass is in order.
I don't understand why anyone would sign up for services that have an unknown future cost. This is exactly why I avoid Amazon's S3 and prefer something like Digital Ocean (or Hetzner). I would much rather have my service shut down than spend many thousands of dollars because some cache failed.
Agreed, I've had large bills for cloud providers, forgetting to terminate a GPU instance, or didn't realize that having a disk image (even not running) costs money.
>> why anyone would sign up
It happens more often than you think: people sign up for credit cards and forget to pay the monthly bill in full. Sign up for a cell phone plan and get charged with large bills of international roaming. People sign up for monthly subscriptions, and exceed the usage limits.
The entire ecosystem has been herded into complex deployment patterns that make it labor intensive to manage infrastructure without using managed cloud services.
> I just don't understand why people use cloud services.
To handle that day of getting 1 million customers, which you've been forever optimising for.
Any.. day.. now...
Where did you get 600 GB per day? That only would’ve cost $8.40 per day. It looks like it was actually 25 TB per day which is over 40x what you said.
From the article:
This was about AU$350 a day for a month… priced at AU$0.014 per GB
A company could not stay in business if every one of their “unlimited 1 Gbps” customers for €40 per month actually used that bandwidth.
Hype, HIPPOs, FOMO, buzzword driven resume.
Scalability, reliability, provided maintenance for every aspect (hardware, software, backups).
I'm fan of cheap VPSes too, but I'd like to have things like metrics out of the box
€40 gives you a dedicated server, not just a VPS.
Getting metrics on that is not a hard problem, there are various projects that are relatively simple to set up.
If you want to make it easier manage resources, metrics out of the box, and avoid (hoster) lock-in then I'd use a hyper visor distro like Proxmox VE (disclaimer, am a dev there) or the like, and you can migrate (or backup/restore) VMs or Containers easily to other providers. That gives you a (relatively) simple web-interface to manage most things and also opens the possibility to just add a second or third dedicated host down the line to scale out, if those new hosts are in the same DC or have a good interconnect (latency wise) you could even cluster the nodes.
To make a fair comparison you need to consider the time cost for setting all of that other stuff up compared to having it out of the box. I'd an engineer on 100k takes a week to get it up and running then your vps cost 2k to set up and 40/month going forwards.
If an engineer needs that much time you have a serious technical debt for setup of your software or a inexperienced or inept engineer. The developer or at least operators need to be able to setup your software for more frequently for testing anyway, if they cannot do that rather quick you got other problems..
I can set up Proxmox VE as hyper visor, some container for each DBs, load balancer in front and some app in about an hour max from scratch, with good testing and some bells and whistles, and here I really do not want to brag or the like, as such operations are not my job to do at all, I only know because I do that occasionally for some tests and for some private infra I just maintain out of interest - so I really want to say, if some operation-dork can do that, the engineer you hired should be able to do it at least as quick.
But yes you're right in the general point, upfront setup and frequent maintenance is naturally something you need to price in. I just think that if you have that many different parts with complex coupling to induce such a huge maintenance effort required to keep your product running, the cloud offer may not really be your salvation and just delay the fall while costing all the more.
> If an engineer needs that much time you have a serious technical debt for setup of your software or a inexperienced or inept engineer.
Everyone does something for the first time once. Just because someone has not set up a hypervisor before doesn't mean they're inexperienced.
> I can set up Proxmox VE as hyper visor, some container for each DBs, load balancer in front and some app in about an hour max from scratch,
And I can spin up containers + load balancer on AWS in less than five minutes. That doesn't mean that it's just an easy thing to do. (although, this specific example is).
> upfront setup and frequent maintenance is naturally something you need to price in. <...> the cloud offer may not really be your salvation and just delay the fall while costing all the more.
Agreed 100% on both counts.
>Just because someone has not set up a hypervisor before doesn't mean they're inexperienced.
Wait, what? If you never did something, then you're unexperienced, ain't you?
You can be an experienced $job without having ever done $one-particular-thing-related-to-job.
Experienced != Knows 100% of things.
The difference with a PaaS/serverless system is that I don't need to hire you, or have someone on my team learn to be you.
I'm sorry, but all that stuff you describe doesn't bring any business value. My customers don't care what hypervisor I'm running, so I don't care either. PaaS means someone else deals with it, forever. The last time I had to employ an ops (or devops) person was 2007.
True, but setting things up on AWS isn’t free either.
That's true - if you want to make a fair comparison between the two you need to consider the costs of the setup on AWS vs the cost of setting it up on whatever your platform of choice is. For a small team with no/development only loads, then a $5 digital ocean droplet would likely work for them, maybe even 10 of them. It's not worth managing a VPS for deploying 5 containers when you can have DO do it for $25 behind a load balancer. For a small team with moderate load, the question is "is it worth spending X on setup to save Y but potentially spend Z on maintenance of the systems on Hetzner/whoever, vs spending A on setup, B on compute and C on maintenance". If the difference is < 6 months salary, you go with whatever your current team is comfortable with and reevaluate in a year.
For a large company, it's not about $ cost, it's about risk management and avoiding cost centers.
Good point. If you are using Linux on a daily basis, it's easier to set up a server than configuring AWS.
You'd have to figure out the setting up stuff just once though and then automate it. It's not like you have to go through this for every additional server you will add in the future or when you have to rebuild it.
Also, it doesn't take a week.
> You'd have to figure out the setting up stuff just once though and then automate it.
You're assuming that this is for recurringly set up infrastructure. Sometimes infra is set up once and maintained, othertimes it's set up and spun down. It's also not always automated. The time spent automating something like that might not be worth it in the medium to even long term.
> Also, it doesn't take a week.
The actual amount of time it takes doesn't matter; if it's a day or a month. what matters is costing the time spent on setting it up and maintaining it, and pricing that against AWS costs.
You can use Hetzner's cloud. You get metrics and still have a lot of free traffic with very low cost above that.
He should and did use torrents.
good luck getting a gigabit speeds from a hetzner box in any form of consistency
Wouldn't riding a horse prevent that car crash?
Luckily you can avoid both by just cycling everywhere. Lower CO2 output and lower cost, too.
I use rented dedicated servers for everything, and always travel by bicycle or transit. It's not as ridiculous as you make it seem.
If I ride a bike from work to home, my fat ass will be terrible unhappy. If I ride a horse, the horse would be. I could drive, but driving in London is not fun. Luckily, there is a decent public transport system that fits my needs. The point is, there is a context for everything and it matters.
You might like installing and configuring software, I don't. I'm more than capable of doing so myself, but I'd rather build things on top of other things. I'd rather use a battle tested Secrets Manager and have db replication set up for me. I'm grateful to people that like doing these things I don't and I'm expressing my gratitude by contributing to their paychecks via my cloud bill.
To go back to my initial reply, if you change the context, eg the context is driving a car, you can't possible crash the car you are not driving. If the context is "get home after a few too many pints at the pub", then riding a horse is much better than driving a car (and crashing it). Context.
Horse carriage accidents were surprisingly common and deadly for the low speeds they traveled at - but, I did enjoy the analogy [=
Not much to explain, you're absolutely right. Hetzner would have been a much wiser choice here, but advocating any cloud provider at this scale probably has its perks too or he wouldn't be burning his money. Then again, perks only go so long and at some point do come to an end, so this is why he may be writing about costs right now.
Take a look at their datacenter in Germany: https://www.youtube.com/watch?v=5eo8nz_niiM
Wow, that video is fascinating.
Love how they are totally not ashamed to kick off the video with their collection of 14,000 mini-tower desktop PCs. Not rackmounted. Mini-towers.
Also totally ultra-curious about the PS/2 kvm. All those machines are from an era when USB keyboards had been around a long time already. Wondering if this is a security measure...
Perhaps for the same reason that the vast majority of the readers of this site don’t use Hetzner: they are not European and neither are their users.
Hetzner is just an example - you can get cheap dedicated boxes with gigabit uplinks all over the world. And in this example it's not even important what latency the server has, since it was only feeding Cloudflares CDN with data.
Can you? I have not actually been able to identify a cheap, reputable dedicated server provider in the US. Ten years ago there were a few.
The reason Europeans tend to favor European service providers generally has to do with strong data protection guarantees and some level of protection against foreign surveillance. In practice a lot of European companies still use US services or at least services provided by US companies -- Troy Hunt is Australian and uses Azure from Microsoft, so this isn't just a thing Europeans do either.
I'd love to hear your reasoning why people who aren't European would prefer to avoid European service providers.
> I'd love to hear your reasoning why people who aren't European would prefer to avoid European service providers.
I'm generally a Hetzner fan as well for global services, but I can see the point in avoiding Hetzner (for example) if all of your users are in the US, since Hetzner only offers dedicated servers located in Europe (Germany and Finland if I'm not mistaken). Generally you want users to hit servers that are close to them, so something like Vultr would be better if the scenario mentioned before applies.
Exactly. Speed of light is too slow. Speed of light in cable is 200.000km/s, so if you are 10.000km away, your minimum ping time is 100ms (+server time).
https://www.hetzner.com/?country=us
They also have dc's in the US
Are we talking about the same thing? I know Hetzner offers Cloud servers (VMs) in the US since recently, but I don't think they offer Dedicated Servers in the US (yet?).
You are correct.
I didn't realize it was limited to their cloud offering. Nice one!
When deploying pet projects I could not care less about privacy.
OVH then? They have similar offerings, unlimited traffic, multiple datacenters to pick from.
I guess that's out of the question too if it's a problem that the company is European, since OVH is French.
Hetzner launched in the US by now