Why is Multi-Cloud a Hard Problem?

82 points by atopia 6 years ago · 29 comments

Reader

> Why use true multi-cloud clusters?

> Two reasons: Disaster recovery and freedom from vendor lock in.

In my experience, those two reasons are almost never sufficient to warrant a multi-cloud solution. The costs for multi-cloud are enormous. Another commenter mentioned egress costs, but there are numerous other costs:

1. You've added a lot of complexity on top of existing cloud solutions. That complexity can make things fail in unique ways that may make some of your cherished reliability benefits moot.

2. You are always coding to the "lowest common denominator" of any cloud service, meaning you're missing out on a ton of productivity by forgoing useful services.

I'm just curious if anyone can comment that has experience actually using multi-cloud, and was it worth it?

dkhenry 6 years ago

I was one of the individuals who helped put this together and hopefully can answer your question. Specifically about complexity and the productivity of going multi-cloud.
Here is what we found, previously when people talked about going to the cloud the state of the art was everything is done targeting a specific cloud provider, you put your software in immutable AMI's and you use ASG, and ELB, along with S3 and EBS to have really robust systems. You instrument everything with CloudWatch and make sure everything is locked down with IAM and security groups.
What we have seen lately is that because of Kubernetes that has all changed. Most systems being designed today are being done very much provider agnostic, and the only time you want to be locked into a specific technology is when the vendor provided solution doesn't really have an alternative in a truly vendor agnostic stack. Part of what this service is doing is taking the last true bit of Gravity for a cloud provider and removing it, you can now run in both clouds just as easily as if you were all in on one of them. There are some additional costs if you are transferring all your data across the wire, but that is where the power of Vitess's sharding comes in. You can run your service across two clouds, while minimizing the amount of cross talk, until you want to migrate off.
Also while this post makes a big deal about being multi-cloud, this also gives you true multi-region databases. Thats something that was really only available with Spanner or CosmosDB previously, both of which require you to target them explicitly. PlanetScaleDB lets you use your existing MySQL compatible software.
- hn_throwaway_99 6 years ago
  
  Thanks for the response. While I certainly see the value of being provider agnostic, I just don't see the value of being multi-cloud within the same app or service.
  I worked at a company that wrote an internal infrastructure/deploy management tool based on Kubernetes. You could deploy an app either to our colo facility (we were an old business transitioning to the cloud), or you could deploy it to AWS. As a developer I never interacted with the AWS console, this internal tool just hid it all from me. However, while I had the option of deploying to our colo or to the cloud, it was one or the other; the service was only running in one platform in prod.
  And after a multi year push to the cloud, the company actually had to stop that huge push because costs were spiralling out of control. Managing all the costs across a huge enterprise of many services (some micro, some not) became a huge challenge. Can't even imagine the additional cost or complexity if some of those services spanned multiple cloud providers.
- rubyn00bie 6 years ago
  
  What does this mean: "PlanetScaleDB lets you use your existing MySQL compatible software"?
  I thought you all were offering Vitess not a "custom" solution, or are you speaking marketing?
  - derekperkins 6 years ago
    
    They are providing a managed Vitess service, which allows you to use almost any software that is compatible with MySQL.
closeparen 6 years ago

If you wouldn't buy the license key from Oracle, don't rent the network API from Amazon.
Proprietary software can be good in contexts where the integration surface is small. The denominator you want to code against in such a way that you become deeply intertwined with it & would have a hard time backing out is open source.
The "lowest common denominator" being Linux, Postgres, Kubernetes, etc. is not such a terrible thing.
ThePhysicist 6 years ago

Multi-cloud deployments often happen organically in large organizations, e.g. because there are always legacy systems, possibly different procurement processes in sub-companies, different legal requirements in national branches leading to the use of different vendors or acquisitions of companies with different tech stacks. And while consolidating all IT on a single platform sounds really tempting it can quickly become a disaster and spectacularly fail as well, so multi-cloud deployments are probably here to stay.
- sk5t 6 years ago
  
  You're describing fragmentation, not multi-cloud resiliency or diversification. Think "something catastrophic has happened to AWS but the business can still ship orders because the required services are also running in Azure."
- PeterStuer 6 years ago
  
  I think the parent topic is about deploying a single application across multiple cloud vendors, not having application A on Azure and application B on AWS
AndrewKemendo 6 years ago

We in the DoD are very intent on ensuring that we don't have single point of failures or the ability for an adversary to cut access to our capabilities. That especially includes compute/store.
So Global high availability (Hybrid multi cloud) and no vendor lock are pretty important.
- ocdtrekkie 6 years ago
  
  I agree on the importance of resiliency and avoiding lock-in, but I thought the whole recent cloud contract was the choice of a single vendor for all the DoD's cloud infrastructure?
Bombthecat 6 years ago

There is a reason : quit a cew saas provider are on only one cloud... Sooo if you want or need to use a service and want a good connection, you need to go with a few things to the other cloud provider...
hartem_ 6 years ago

Agree with the sentiment.
Many enterprises got really excited about multi-cloud really fast but then gave up once they faced the harsh reality of increased complexity (and, as a consequence, cost and time).
People tend to grossly overestimate the actual size of the ‘common denominator’. On the face of it services look almost identical (every cloud has compute, blob storage, block storage, etc.), in reality, there are so many subtle differences between analogous services (API, pricing model, failure modes, performance, security models) that the support and operation cost easily more than doubles. So even in large enterprises where there is a top-down edict to ‘be multi-cloud’ actual BU/team silos tend to stick to one or the other provider.
The other often overlooked aspect is that clouds offer bulk discounts (I’ve seen up to 40%) for customers that spend many tens or hundreds of millions of dollars, but one can only get this if they stick to one provider. In other words, the economics of multi-cloud doesn’t scale well.
Kubernetes strives to be the common layer, but cross-cloud deployments tend to be very convoluted and non-trivial. They require a lot of manual work, tons of expertise in different domains (networking, security), greatly increase management overhead, and introduce funny new failure modes. In addition, Kubernetes is just one piece of the puzzle. After one is done with Kubernetes, they still need to figure out their user-facing services for launching containers, provisioning and managing databases, message queues, analytics, and machine learning pipelines. Now compare that entire ordeal with clicking a few buttons (or writing several pages of Terraform) and having everything setup and ready to go in an instant and on-demand. Yes, you do have to tie yourself to a single cloud, give up some degrees of freedom, and use extremely high-margin proprietary services (like Kinesis instead of Kafka) but it’s so so much easier, faster and cheaper (at least in the short to medium term).
To answer the original question (and sorry for going off on a tangent), I haven’t seen any company actually succeed with multi-cloud (my sample is 100+ SMBs and large enterprises). Even extremely tech-savvy and sophisticated companies like Twitter (with their shift to GCP) tend to think of all this as something that doesn’t necessarily need to be built anymore and should be bought instead.
What I do see all the time, is companies trying to figure out a way to marry their existing on-prem and (single) cloud provider setups. It is still a struggle though, and will probably always be to a certain extent.
Edit: typos

9nGQluzmnq3M 6 years ago

One key missing factor: live replication across three clouds is not just a technical problem, but a cost problem, because the egress costs will be murderous.

elabajaba 6 years ago

Not everyone rips you off on egress fees the way Amazon/MS/Google do. Quite a few of the 2nd tier providers (eg. Vultr, Linode, Digital Ocean, Upcloud, etc) offer $0.01/GB for public outbound bandwidth with a free allowance each month (usually anywhere from 1-10+TB/month/instance depending on what you deployed). Some companies even waive fees for companies they're partnered with (eg. if you use Wasabi or Backblaze B2 instead of S3 you won't be charged transfer fees to a number of cloud providers or Cloudflare thanks to the bandwidth alliance).
- 9nGQluzmnq3M 6 years ago
  
  The article is specifically talking about their DB solution "across the three major cloud providers" = Amazon/MS/Google.
  - dkhenry 6 years ago
    
    It's not mentioned in the blog, but we are actively working with other providers to bring them on board. Hopefully we will have Digital Ocean soon, and I am hoping for Packet after them

different_sort 6 years ago

The article was a good read, but I wanted to try to answer this from the perspective of a fortune 100 enterprise (I work for one, in their cloud team).

We're starting a journey on Azure and AWS at once with limited financial resources, and limited talent(it's tough to hire in cloud skills to work for us, and our stack is so old it's not an easy transition for people who only know that). Operating AWS and Azure and require different skill sets and different approaches and they're far from transferable. All the tools and techniques we develop or acquire for managing AWS are not applicable to Azure and vice versa, and because we're splitting our effort between the two everything takes twice as long.

I think the right way for a company like us to approach this would be to go "all in" in one, build expertise and offer a lot of value back to the business, then look to build out the second cloud to meet your BCP/Cost Savings goals.

k__ 6 years ago

Funny thing is, if those multi-cloud proponents would go in 100% on one provider, things would go much smoother and they would have less reasons to go multi-cloud in the end.

But yeah, when I look at the rate that some companies sunset their products, I understand the fear a bit.

fooker 6 years ago

Multi-anything is a hard problem.

tekno45 6 years ago

Multi-person arguing is pretty easy.
- vikramkr 6 years ago
  
  But winning an argument against multiple people is a lot harder than winning an argument against yourself

boris-ning-usds 6 years ago

I'm dealing with similar problems - trying to setup direct connection between AWS and Azure.

How does planetscale handle the complexity of DIY classic VPN and ensuring a high availability on those VPN links - and ensuring that a certain amount of throughput can be sustained?

Is there a requirement for planetscale to create a full network mesh between all cloud providers, all regions? I'm assuming that it's more selective because it becomes untenable as more cloud regions pop out requiring (n * (n-1))/2 VPN links where n is the number of cloud regions.

Happy to learn anything I can here. Thanks for the blog post.

dkhenry 6 years ago

Yes there is a requirement for a full mesh, and generally it has to be in all regions. GCP will route for you at the Network level so we could get away with not all GCP regions being peered to all AWS and Azure regions, but all AWS and Azure regions need to be peered to each other.
For the HA of VPN links for most providers its handled automatically, AWS <-> Azure and AWS <-> GCP both are HA links offered by the provider. Azure <-> GCP is a Classic VPN so we need two of them and we need to manage the routes to make sure they would fail over in the event of a loss of one system.
Throughput is another story, we are very much limited by the throughput of the various VPN's. We haven't pushed GCP or Azure to the max to see what they can do, but according to the documentation we should be expecting around 300Mbps across each link in the mesh before we start to see throttling. At that point it makes sense for us to move to a co-located exchange and peer with dedicated connections.
Finally for other providers, when we start to get them inbound we will be looking at using the transit gateway's of the various providers to reduce the total number off links needed, or standing up virtual routers to act as exchanges.
Hopefully we will be doing another post with more technical details and some benchmarks!

2ion 6 years ago

> Abhi: Hi team! On the level of our Kubernetes operator, what do you think was the hardest challenge in making multi-cloud databases work?

Multi-cloud is a network problem. Ask anybody who knows what they're doing: is it the best idea to have dependencies over WAN? No. Can it be a solution to a problem? Yes, but what's your problem? PlanetScale might have a case if their product sells.

Then* only come the platform problems.

alex_young 6 years ago

This is an advertisement

jspaetzel 6 years ago

Square peg, round hole.

Settings

Why is Multi-Cloud a Hard Problem?

Keyboard Shortcuts