Ahrefs Saved US$400M in 3 Years by Not Going to the Cloud (2023)

tech.ahrefs.com

80 points by ianwehba 2 years ago · 65 comments

Reader

_xivi 2 years ago

Previous discussions:

- Ahrefs saved $400m in 3 years by not going to the cloud (2023): https://news.ycombinator.com/item?id=35094407 - (163 comments)

- Ahrefs Saved US$400M in 3 Years by Not Going to the Cloud (2023): https://news.ycombinator.com/item?id=35108813 - (44 comments)

---

Similar sentiment:

- X celebrates 60% savings from cloud exit: https://news.ycombinator.com/item?id=38041181 (18 comments)

- Leaving the Cloud: https://news.ycombinator.com/item?id=33301078 (195 comments)

- We stand to save $7M over five years from our cloud exit: https://news.ycombinator.com/item?id=34878140 (18 comments)

- Our cloud exit has already yielded $1M/year in savings: https://news.ycombinator.com/item?id=37530011 (3 comments)

sourcecodeplz 2 years ago

They crawl all the time, their instances could go down and no problem, there are still hundreds doing the same task. They consume waaaay too much traffic for the cloud to make sense financially.

Hybrid approach is best in cases like this. Use the cloud for client facing interfaces and rent dedicated servers for the spiders.

edit: even better, build your own data center instead of renting.

sph 2 years ago

In a much smaller scale, I'm working on a web crawler as well, and renting a dedicated server at Hetzner with unlimited traffic is cheaper than any VPS, or cloud offering.
8 cores, 32 GB RAM, 2x 500 GB SSD for ~€40/month — it's an older CPU but web crawlers don't spend too much time crunching numbers anyway.
- bomewish 2 years ago
  
  What crawling framework you using?
  - sph 2 years ago
    
    In-house made in Elixir.
    20% of a crawler is fetching and parsing pages, the remaining 80% is dealing with misconfigured, broken and non-standard web servers and HTML. Dealing with Cloudflare, Akamai and random bot-busting tools that cause more false positives than a chaos monkey. It's better to write one yourself that you can control, monitor and operate as you need, instead of relying on third-party logic. Makes sense for my business, at least.
    
    bomewish 2 years ago
    
    Ah. Have so been there. But don’t really have the resources to spin something from 0. Good luck!!

moltar 2 years ago

Yes, but this is truly an exceptional case. Their workloads are basically scraping (crawling) at a massive scale. Just like Google does, it makes more sense to have cheap throw-away hardware for this use case.

There are no permission issues or ACLs.

There’s no need to auto scale and the traffic is very predictable.

There is no serious need to orchestrate deployments. I imagine it’s mostly just workers reading URLs from a queue and crawling a page. So very easy to deploy new servers.

This is just an edge case scenario specifically great for self hosting.

m1keil 2 years ago

Hardly an exceptional case. A lot of web shops use auto scaling as means to save money, not to respond traffic spikes like black friday's.
What is easier, having a bunch of powerful servers that provides you enough headroom or having to fight your auto-scaling group to have just enough capacity and in the end of the day still costing more?
iforgotpassword 2 years ago

It's not that much of an edge case. Sure, their load is super steady, but most other workloads are predictable enough, or rather it is still cheaper to over provision to your typical peak load and then some than doing the same entirely in the cloud. You might still get slashdotted if you have some overnight success, and whether this is acceptable in your business model then depends. You might've taken the hybrid approach where you can spin up additional resources in some cloud.

politelemon 2 years ago

This is a weird read. The analysis makes the classic mistake of assuming a lift-and-shift calculation. Of course that's going to be more expensive. You save money by re-architecting and using more managed services.

Which makes me scratch my head at the concluding statement:

> A cloud is convenient and locked in.

Everything is a lock-in. But in the case they've described, which is just shifting from VMs to EC2s, it is the exact same thing, there is no lock-in from their perspective other than to use the phrase as a boogeyman.

sgarland 2 years ago

I hear that all the time, but I don’t see how it’s true. Have you priced RDS vs. an EC2? It isn’t even close. You’re paying for the convenience. Not to mention the massive speed loss you take with the increased latency of EBS.
- deadfece 2 years ago
  
  I've worked places where they had so many databases from M&A that half of their FTEs were wholly preoccupied all year with performing outstanding DB maintenance: fixing backups, doing storage management, patches, applying move/add/changes.
  For them, managed DBMS was a life-changing event. As soon as they had RDS or even Azure SQL MI, they were begging the cloud team for more, so they could get their team back.
  In some businesses, it's definitely not a big loss to lose agility by having a large portion of your team tangled up in infrastructure management, but for some businesses, that constraint is an impediment to their line of business. Some businesses are missing opportunities for want of infrastructure being able to move fast enough.
  - sgarland 2 years ago
    
    Much of that can be automated (how do you think AWS does it?), but I get the point. Still, the OP said: "You save money by re-architecting and using more managed services." To me, that isn't implying "save money from personnel costs," it's discussing pure service cost.
    
    deadfece 2 years ago
    
    That maintenance problem is sometimes also expensive in terms of service costs. Which is more of a tech debt problem than a cloud comparison, honestly, but they're often inseparable.
    Extended support costs on old hardware and software are sometimes astronomical. Then you're paying all of that to get something that, compared to modern gear, performs like ass and just breaks all the time.
    I've also had C-suites tell me I have two months to start a project and have all the gear live, but I find we are only allowed to order direct from the manufacturer, and our orders will take 90 days to get to our door. Oh boy. That scenario is difficult to put into terms of money. Sometimes having that one-hop instant supply chain into a cloud service is a huge business enabler, sometimes it's not.
    
    sgarland 2 years ago
    
    I regret to inform you that you can also incur extended support costs with managed services.
    
    deadfece 2 years ago
    
    Ah, I thought we were having a discussion in good faith. Later!
bcaxis 2 years ago

> You save money by re-architecting and using more managed services.
Managed services are more expensive, not less, per unit compute, per operation, etc. How exactly does that make things magically cheaper?
silverquiet 2 years ago

Re-architecting is also very far from free and the result of spending all that developer time is that you are indeed then hard locked into your provider's managed offerings. But even lift-and-shift will have some lock in. The classic is the egress bandwidth roach motel model; free to enter but expensive to leave.

skywhopper 2 years ago

Really misleading numbers in a lot of ways. Notably, they dismiss the risk of inflation, ignore longer term maintenance, choose really poor cloud analogs for their architecture, and ignore cost-saving options like spot instances and pre-pay discounts.

All that said, yes, cloud is often more expensive for simple applications with stable 24/7 workloads that don’t evolve over time. Do the research and choose the right infrastructure platform for your business.

addicted 2 years ago

The new mantra with the cloud champions in my company is that cloud was never meant to save money. It’s a premium experience that’s about saving time.

This did not sound right so I dug into the emails our leadership sent us between 6-3 years ago upping our “cloud transformation”. And yup, saving money was a part of it.

It’s only over the last year or so where it’s become obvious we didn’t save any money and in fact spent a lot more that it’s become about functionality and quality and not about cost.

The cloud may beat on Prem on functionality. For example, global colocation is much easier with the cloud. But don’t f’ing gaslight me and tell me the cloud providers hadn’t been selling cost as a benefit and even the primary benefit for the first 10 years or so at least.

ghaff 2 years ago

Whether or not using a cloud for a given workload today makes sense, the original mantra was absolutely around cost. The basic narrative was that computing was a utility and--especially before local solar was really a thing--how could an individual company compute/generate electricity competitively when they could just consume it off a grid?
- jart 2 years ago
  
  I thought the original mantra was social mania, after an order came down from the prince of darkness.
eloisant 2 years ago

> The new mantra with the cloud champions in my company is that cloud was never meant to save money. It’s a premium experience that’s about saving time.
But that's true, AWS really took off when people were developping Facebook apps and seeing exponential growth over a few days.
They wanted servers right now, but their supplier had 1 week of lead time to get a new machine, + setup time. On the other hand you could spin up a new EC2 machine in a few seconds.
- joepie91_ 2 years ago
  
  During the time that this became a frequently-repeated marketing talking point, 'regular' hosting providers with one-day turnaround or even instant provisioning were already widespread. It was never really the revolutionary feature it was pitched as, honestly.
ptrhvns 2 years ago

Indeed. Cost optimization is one of the “6 Pillars of the AWS Well-Architected Framework” as promoted by AWS themselves:
“The Cost Optimization pillar includes the ability to run systems to deliver business value at the lowest price point.”
https://docs.aws.amazon.com/wellarchitected/latest/framework...

orf 2 years ago

120TB of storage is 3k USD per month when using s3 in Singapore and can sustain a much higher aggregate read/write speed than their existing setup.

Like many have said, a lift and shift is never great, and imagining you need 120TB of EBS per instance then being surprised it costs a lot is rather telling about the accuracy of such estimations.

Nothing was mentioned about utilisation - like basically everything, services follow a utilisation trend across a given time period. This assumes 100% used capacity at all times.

Moving to S3 and being able to scale down to 50% capacity at non-peak hours seems to nearly equal the cost, aside from the human+time cost savings. Using spot instances would also save even more.

Lock in also takes many forms. If you’re locked in to an infrastructure that only supports a certain type of system with big bulky servers and big bulky disks, then you’re going to build that kind of system. You can’t take advantage of something like a lambda for specific parts of your scraping pipeline, or SQS or S3. These are useful things to have at your disposal when designing systems.

jerrygenser 2 years ago

At this scale the negotiated price with AWS could be a lot cheaper even than the sticker price on 3 year reserved instances.

nunez 2 years ago

i don't buy it.

first of all, ahrefs discounts the "people" cost, but that's a huge cost to ignore!

the biggest advantage that AWS and the like confer is being able to reduce interactions with literally every piece of infra you consume from them down to APIs.

having physical hardware means you need a team who knows how to rack/provision/configure/update hardware *along with* administrating operating systems and everything that comes with *along with* the automation needed to hold everything together.

finding people who had all of those skillsets was super challenging before The Cloud appeared, and is especially hard now since everyone who would have those skillsets prefers to work with cloudy things (because everything's an API).

second of all, they made the classic mistake of doing a one-to-one comparison of running their business on EC2. ofc that's going to cost a ton! you're basically just renting VMs from them at a huge premium. that can be done anywhere else (Hetzner is popular) for much cheaper.

that's not why you move to the cloud.

when AWS or Azure says they help companies save money, they usually mean taking an app that runs really well on-premise on a fixed set of compute that's a whole process to scale and making it run even better on smaller, but more distributed, compute that should be less expensive due to economies of scale.

Do web crawlers like these _need_ to run entirely on huge EC2 instances that run hot all of the time? Could they take advantage of more fractional compute from things like EC2 spot autoscaling groups or "serverless" compute? Ahrefs uses local NVMe storage for everything, which is definitely cheaper than EBS. Could they use data archival pipelines to compact and move less-used data onto slow networked storage? Could they benefit from using more aggressive caching for sites that don't change very often?

finally, for every company like Ahrefs who runs lots of compute hot 24x7, there are at least 20,000 companies who spend big money operating datacenters for apps that don't justify the cost. they _could_ save significant amounts of money by moving to the cloud AND re-architecting their apps to spend compute more efficiently.

bcaxis 2 years ago

> first of all, ahrefs discounts the "people" cost, but that's a huge cost to ignore!
It's the hardest to compare. Cloud has a significant people cost too, particularly in the complexity. EC2, lambda, S3, RDS, IAM, cognito, glacier, light sail, EBS, fargate, cloud front, SNS, dynamo, elastica he, etc etc.
If you ignore this, you fail to completely understand a service you start to use. That's the quickest way to a surprise mammoth bill or insecure service.
> they _could_ save significant amounts of money by moving to the cloud AND re-architecting their apps to spend compute more efficiently.
This is a bad argument generally. If an org, institutionally, can't properly size services or they leave a lot of fat in their services.
Why would a move to the cloud fundamentally change that? You can even more easily over provision and overspend when it's a mere button click away.
I fail to see how a move to the cloud basically creates greater institutional ability to trim fat and optimize that just wasn't there before.
And I just love how convenient re-architecting is so casually thrown in like it's a walk in the park... Yikes. Rewrites can and do kill companies. Not a casual thing to undertake because a new vendor comes along with a shiny new tool.
sgarland 2 years ago

> having physical hardware means you need a team who knows how to rack/provision/configure/update hardware
Not with a colo. They all have that as a service. If by provision you mean installing an OS via PXE or the like, then no, but everything else, yes.
> along with administrating operating systems and everything that comes with along with the automation needed to hold everything together.
If you have a team managing Terraform for cloud infra, congratulations – you have a team who can manage physical infra. Seriously, it’s the same tooling. Pick your favorite configuration management tool (Ansible, Chef, Puppet…) and you’re off to the races.

scns 2 years ago

Arehfs uses ReasonML in the frontend and OCaml in the backend.

Not afilliated.

mobilio 2 years ago

Original post: https://news.ycombinator.com/item?id=35094407

asjkaehauisa 2 years ago

How much do servers cost nowadays? According to the article, the cost would be $61,500 USD per server for specifications including 2TB RAM, 100Gbps, 16x 15TB drives, and 64 core CPU (assuming "We use high core-count CPUs").

Is this accurate? Could you provide me with a tip on how to acquire a server with those specifications for $60,000 USD?

pinkgolem 2 years ago

In that range you usually have to talk to a sales person, if you are not into that, super micro has a nice online configurator with realistic pricing
You can configure a server with 192 cores, 24x96GB Ram & 16 drives for 60k no problem
tracker1 2 years ago

Totally doable, as a peer response mentions... You can get a lot of performance in a couple 1-2u servers these days. Literally replacing a full rack from 6-8 years ago in 8-12u.

michelb 2 years ago

Not directly related, but how much energy is being wasted by thousands of companies scraping the internet continuously and storing roughly the same information as everyone else, and then storing that in their own datacenters? I understand the commercial reasons for it, but this all seems very inefficient.

skywhopper 2 years ago

Nowhere near what is currently being wasted on LLMs generating the content those scrapers will soon be copying to their datacenters, or the cryptomining propping up Bitcoin.
mike_hearn 2 years ago

Easy enough to fix if there's enough demand, just sell crawl logs. But maybe there's too much diversity amongst potential customers to make that business viable.
tracker1 2 years ago

When I was at a modest sized public facing site, roughly 3/4 of requests were from bots. Most painful is over half were search result pages, which were much more costly.
hasty_pudding 2 years ago

The semantic web was a beautiful idea at one point

speedgoose 2 years ago

850 servers with 240TB of NVMe SSD storage and 2TB of ram each, SEO pays much better than what I thought.

visitor4712 2 years ago

if you give the heart of your business (data) to an alien company you make TWO collosal mistakes:

1) you transfer the core of your business to somebody else 2) you can be blackmailed service and costs wise

outsourcing in general is a deadly management fashion.

deadfece 2 years ago

The article honestly reads as if written by a very smart sysadmin with zero cloud experience.

1:1 lift and shift is always obscenely more expensive. In this case, if the author had been in charge of the migration, then yes, the services would have cost them dearly to operate in the cloud.

I'm sure if I was personally put in charge of moving some aspect of IT into an unfamiliar mode of operation, my inexperience there would make my approach insanely expensive as well.

That says nothing about the target, except that having undertrained and inexperienced staff in charge of its design and implementation is probably foolish from a financial perspective.

There are obviously thousands on thousands of scenarios where moving to commodity cloud is an absolute slam dunk in aspects that are important to the subject business.

Unfortunately we really get no insight into what the workload truly is in the article's comparison. There's no mention of solution aspects like app architecture, security, HA/DR, SLA, RTO/RPO, security or backups [1]. We only get what is plainly a tunnel-vision view of a comparison.

It's almost like the author doesn't make solutions for a living.

Maybe the author actually realizes their blind spot, and is secretly utilizing Cunningham's law to crowd-source a relatively free solution from the professionals and amateurs in the internet comments sections.

The good architects don't work for free. There's a reason why Troy Hunt's web services cost him vanishingly little to operate, and it's certainly not by running IaaS VMs 24x7x365.

[1] I mentioned security twice as part of an ongoing effort to make up for all the times CyberSec/Infosec teams have been forgotten in the planning process. =P

pinkgolem 2 years ago

>There's a reason why Troy Hunt's web services cost him vanishingly little to operate
And I thought that's because he is on a cloudflare premium plan with a workload where 99,8% of requests are cached
bcaxis 2 years ago

> 1:1 lift and shift is always obscenely more expensive.
Is it? Managed services cost a lot more than a vm. Re writing software cost a lot more.
Where are the savings?
- tracker1 2 years ago
  
  Fewer IT staff for systems mgt. Reduced costs in off peak hours with on-demand instances. Right sizing resources to application needs.
  There are wins that one can have, but nothing is guaranteed. It will vary by application, size and staff.
  - bcaxis 2 years ago
    
    > Fewer IT staff for systems mgt
    This hasn't been my experience. Replace sysadmin with cloud engineer/architect, salary bump, no reduction in quantity. This assumes you are mildly competent as an organization.
    On managed services, say the database. My experience is that the extra costs of the service are larger (usually much much larger) than any salaries or head count reduction. I'd rather employ more people than not, and actually control my data, given the choice. Particularly when the savings are questionable or false.
    I generally prefer a lower dependency count. Code and vendor. Even at modest immediate cost increases, you gain better flexibility and there are less things to bite you.
    > Reduced costs in off peak hours with on-demand instances.
    Agreed. You do increase system complexity to accomplish it. But there are actual cost savings here.
    > Right sizing resources to application needs.
    This isn't unique to cloud, you can do this in any hypervisor. This is a basic feature.
    > There are wins that one can have, but nothing is guaranteed
    It does not "always" hold. This is critical missing nuance in the original claim.

akouri 2 years ago

> Also, we pay for IP Transit and dark fiber between the data center and our point of presence.

IP noob here- can someone explain what this means?

GreyStache 2 years ago

Broadly speaking: IP Transit is the relaying of bandwidth towards the rest of the world. Dark fiber is a fiber optic connection where you shine light in and get it out on the other end, no other party on the physical fiber strand.
(Both are generatalizations)

viraptor 2 years ago

I just realised that (ignoring other weird price calculation aspects), they compare the processing power 1:1 for AWS... Does that mean they have no second site / failover? They keep referring to just one datacentre, but missing the "x% chance of a thermal event wiping the company for (colocation+hardware lead time)" is not something you can swipe under the rug.

codethief 2 years ago

(2023)

Dunedan 2 years ago

dupe of https://news.ycombinator.com/item?id=35094407

supriyo-biswas 2 years ago

In general, it has been the position of moderators that reposts are generally fine with an interval of 6 months-1 year.

creshal 2 years ago

>by not going to the cloud

...in the worst way imaginable

Doing a direct lift and shift with 1:1 replacement of instances is, intentionally, prohibitibely expensive, so you stop and think.

anonymoushn 2 years ago

I don't really get it. You can't buy the same stuff, so reorganize your business to produce the same value by doing less stuff? It seems like advice that you could follow to achieve massive cost savings even without involving clouds.
- viraptor 2 years ago
  
  Not necessarily doing less stuff. There are basic things like Singapore being more expensive than the US, so why not host there? Maybe analyse if you use that much storage all the time, or would some mix of EBS/S3 be better? How much utilisation is there really? Maybe you can scale down often? Or use bare metal instances rather than EC2? Also, you can negotiate way lower pricing that what's published.
  This post is basically "see how bad a knife is for unscrewing screws". They're showing off how unfamiliar they are with AWS offering. And that's unrelated to which solution wins for this scenario.
markonen 2 years ago

Do you mean to imply that cloud services at higher levels of abstraction are cheaper per unit of compute than simple VMs? I believe you’ll find that the opposite is true.
At the scale discussed here, there are no free lunches.
- eloisant 2 years ago
  
  It depends on the scale, but running containers over a k8s clusters means your load will be distributed among the nodes according to capacity.
  Managing VMs with dedicated resources directly means you have to distribute the load manually, leading to unused and wasted resources.
  - sgarland 2 years ago
    
    You absolutely do not have to distribute VMs manually. This [0] is a tiny Python script run as a cron that migrates VMs in a Proxmox (also free) cluster according to CPU utilization. You could extend it for other parameters.
    While I don’t personally have experience with more enterprise-y solutions like VMWare, I have to imagine they have more complete solutions already baked-in.
    [0]: https://gitlab.com/tokalanz/proxmox-drs
addicted 2 years ago

Right, now factor in the cost of architecting software not because that’s the best architecture for your use case but to save money on the specific infrastructure choice you’ve made.

Settings

Ahrefs Saved US$400M in 3 Years by Not Going to the Cloud (2023)

Keyboard Shortcuts