Autoscaling for Digital Ocean?

29 points by leemalmac 10 years ago · 44 comments · 1 min read

I've started a thread at Digital Ocean. I want to implement such a system (I could not find any). Any questions, recommendations, "don't invent the wheel" comments?) Does any DO customer need such a tool?

I am not a DevOps guru, but it seems like an interesting project. AWS costs almost twice more than analogous DO servers, but AWS has way more features.

DO community thread: https://www.digitalocean.com/community/questions/autoscaling-solutions-for-digital-ocean-are-there-existing-solutions

Of course it will be open source.

weddpros 10 years ago

At clanofthecloud.com (Gaming backend as a service), we needed to implement autoscaling for DO, in node.js. We used consul.io for service discovery. CPU load was used as the central metric, and each service instance would store it in consul.io. We then triggered the creation/destruction of instances with thresholds.

In the end, we only used it for our sandbox environment, as the production env runs on bare metal (more capacity, cheaper at scale, easier on admin).

So I'd say, from our experience:

- DO's API was quite easy to work with

- consul.io was used as a reliable distributed source of information, for leader elections and health monitoring... Changing the autoscaler configuration in consul.io produced immediate results like starting/stopping new instances... Cool "remote control" effect ;-)

- haproxy/nginx load balancers use consul.io templates to update their configuration

- our autoscaler was HA, through a leader election. The instances managed themselves (no single point of failure). There were at least 2 instances running.

- you should expect a few "surprises" if you're running consul.io on Digital Ocean, heartbeats are delayed quite often (depends on datacenters), which makes failure detection hard

- and of course, we used DO custom images to start new instances

vruiz 10 years ago

> you should expect a few "surprises" if you're running consul.io on Digital Ocean, heartbeats are delayed quite often (depends on datacenters), which makes failure detection hard
isn't that true for any CP store in any "cloud" provider?
- weddpros 10 years ago
  
  I'd say it's more true of cloud providers who don't schedule your VM for 15+ seconds, from time to time.
  As I said, we've deployed our production environment on bare metal (in reaction to this exact problem...)
leemalmacOP 10 years ago

Thank you. I'll look at consul.io closer

pjuu 10 years ago

Everything you need is there. Floating IP's to do HA at the front. Use something like HA proxy and monitor whatever it is you need such as response time from the backends. Once that passes a limit call a script to poke the API and provision you a new node. You could use something like Ansible to provision the node and then place it in to load balance.

brightball 10 years ago

One thing to consider is that DO is really, really good at vertical scaling. You can have about 1 minute or less downtime on an instance (basically boot time) to restart it as a bigger instance (RAM/CPU) as long as you don't grow the HD at the same time.

This is counter to most horizontal scaling strategies but it's really about the same. When you add more servers you're essentially just adding more CPUs and RAM via VMs. Being able to do it on the same machines minus any configuration time or provisioning time is really slick (especially for DBs).

Setting up a load balancer in front of a few instances that could take advantage of rolling vertical scaling would be a spin on autoscaling that played to one of DO's real strengths.

Ambroos 10 years ago

That looks nice until it doesn't work. We regularly see 30 minute event processing when doing flexible upgrades of Droplets (and the same when scaling back down). It's not safe, since the resize isn't guaranteed to be fast and you have no way to abort it once it's started. Sounds good in theory, not reliable enough to use in practice.
- hackerboos 10 years ago
  
  Isn't this what the flexible IP is for?
  You create a new droplet concurrently the one running and then flip the IP to point to the new droplet. No downtime.
  - leemalmacOP 10 years ago
    
    You have to have the a server with the same configuration or a snapshot or provision a new one...
leemalmacOP 10 years ago

Using snapshot to create a new droplet and update some configuration files should be fast either. Probably, longer than 1 minute but not much, I guess.

gedrap 10 years ago

Sounds like a fun project (and totally doable)! :)

If I were you, I'd make heavy use of ansible, or something similar, for provisioning:

1) folks are familiar with it 2) could make it cross-platform more easily 3) well, no reinventing the wheel.

For example, Ansible has ec2 module http://docs.ansible.com/ansible/ec2_module.html where you describe an instance and the number of them that should be running. So if you have 3 instances running and wish to have 5, it does the magic and spins up new ones. Then, you can add them to a load balancer. Maybe there's something similar for DO already?

The way I see it is that it would poll if scaling conditions are met and execute ansible playbooks if they are, and then some web interface to set the conditions / view the scaling logs / current status.

It can turn out to be a very entertaining and educational side project :)

If you decide to do it, drop me an email - something I would be happy to brainstorm and discuss about :)

EDIT: It could also be used not only for autoscaling but also for self-healing. If some instance crashed and is not responding anymore, then spin up a new one.

vidarh 10 years ago

I've build setups like this with Ansible, but the more I use Ansible, the less likely it is that I'll consider it for future projects. I find it a lot less painful to write code to do these things directly than wrestle with Ansible. E.g. the amount of pain I went through before realising that the EC2 module at the time messed with whitespace in the user-data (great when you're transferring yaml... one more reason I detest significant whitespace; but the irony of this breaking in a tool written in Python was not lost on us), for example.
- gedrap 10 years ago
  
  I guess it depends on the use case. If all you want to do is to make sure that, for example, all web servers have the same version of nginx, php and sync the configs of these services, I think ansible is a great tool and writing code to do that would definitely qualify as 'reinventing the wheel'.
  I wouldn't disqualify the tool for a problem in one of the modules either. Bugs happen :)

beekums 10 years ago

It wasn't clear to me how to autoscale with DO either. That's why I ended up using Google Cloud Engine. It's not exactly cheaper than AWS, but the pricing model for discounts is WAY simpler at least. It also doesn't have the feature set AWS does, but it does have most of what you'll need like a load balancing service.

vidarh 10 years ago

The pricing for GCE is the most convoluted I've ever worked with... And it's still 2-3 times more expensive than alternatives like DO or dedicated hosting.
A lot of the time when people want auto-scaling, the thing that strikes me is that most of them wouldn't have needed auto-scaling if they picked a cheaper provider to beging with. Often they could pick a dedicated provider, spin up 3 times as much capacity and still pay less.
brianwawok 10 years ago

Ditto. If you need to autoscale you outgrew DO. On top of DO prices not dropping in 4 uears, I see little reason to use them. GCE has $5 instances now with big boy features.

brudgers 10 years ago

My 2 cents: Build an independent autoscaling tool that abstracts over the differences between clouds because:

1. Even if DigitalOcean is the best cloud provider today -- I'm not saying it is or isn't -- the probability it is the best that will ever be is approximately zero. The landscape changes. Heterogeneity among cloud providers is increasing and anyway latency will always be a function of the physical location of data centers where the data is sharded.

2. It's the right alignment for an open source project because it is driven by the broad interests of developers rather than the narrow needs of a single company. DigitalOcean may change its pricing policy. It may cease to exist. It may make breaking API changes. All for legitimate business reasons orthogonal to those of particular developers. If their autoscaling code is platform independent, then that's not a crisis.

Good luck.

kawera 10 years ago

Yes, and libcloud is pretty good for abstracting providers: http://libcloud.apache.org/
gedrap 10 years ago

The only DO-specific bit would be DO API to create/delete droplets (VPS) so that shouldn't be hard at all :)
leemalmacOP 10 years ago

Thanks.

sytse 10 years ago

We made GitLab Runner Autoscale https://gitlab.com/gitlab-org/gitlab-ci-multi-runner/issues/... that can automatically spin up more droplets via Docker Machine to run your CI tests. We use it ourselves for GitLab builds (up to 100 machines) and enabled it for all GitLab.com users. For more information see the announcement https://about.gitlab.com/2016/03/29/gitlab-runner-1-1-releas... Feel free to ask questions.

giancarlostoro 10 years ago

I too have wondered more or less the same, though I have yet to build any projects that need to scale so I haven't turned it into an actual concern for myself as of yet. Hoping you find a solution and share it in here so those of us like me who are curious can learn. It might end up being a mixture of solutions. I can only remember an open source interface that managed multiple cloud providers, not sure if it did any autoscaling, was posted on HN and that's all I remembered, somebody open sourced it but it's description was vague as hell.

Edit:

Commented more.

djkrul 10 years ago

We have an auto-scaling feature in private beta over at https://www.tilaa.com. Drop us a line if you want to test! Ask for me (Dennis) if you need implementation assistance.

You can either use VM snapshots or our metadata service (cloud-init compatible) to provision new instances automatically. We also have an API, but you don't need it for autoscaling.

We're a European cloud provider (based in the Netherlands) and don't have PoPs in the US though.

lxfontes 10 years ago

I wrote this last week, might be relevant

https://github.com/lxfontes/droplet-lb

tgeek 10 years ago

"AWS costs almost twice more than analogous DO servers, but AWS has way more features."

When you us AWS you are paying for more than just a cheap VPS, which essentially is all DO is. It's comparing fast food to a fancy steak house. You can get meat at both, but at one side its microwaved. Not to say DO isn't great, it gets the job done and provides a valued service, but your money gets you what your money gets you.

vidarh 10 years ago

When you use AWS you pay extra for the fancy steak. But you pay steak prices even if what you buy from them is a cheap burger.
It's really hard to find scenarios where AWS isn't ridiculously overpriced.
Consider that Digital Ocean is also an expensive alternative, but I deploy caching proxies for some clients who for various reasons insist on using AWS on DO because you can save lots by deploying droplets on DO to cache rather than pay AWS bandwidth costs for all your traffic, for example (you serve more than 1-2TB a month out of AWS you can start saving money that way).
drakonka 10 years ago

Not everyone needs or wants a fancy steak, which I guess is why the OP is asking specifically about DO even while acknowledging that "AWS has way more features".

bogdan_r 10 years ago

The cloudhero CLI can be a very good option for this. https://cloudhero.io/cli/ (pip install hero)

It would allow you to add nodes with the packages that you need extremely easy. In one or two hours a cron could be made that adds and removes nodes as needed.

It is not a magic bullet solution but it fixes 80% of the problem with 20% of the effort.

andy_wolf 10 years ago

I've used their API to deploy a scalable docker cluster.
Basically, my docker cluster starts from 3 nodes and as the cluster gets filled with containers I automatically add new nodes on Digital Ocean to increase the cluster capacity.
Our team worked a day to make it happen but we got excellent support from the team.

txutxu 10 years ago

Varnish and the d-o API (?)

There are more options, but I think I could go that one.

You can scale compute power... the main issue is... how do you scale bandwitch at digital ocean?

gedrap 10 years ago

Varnish as a LB? Why Varnish instead of, for example, HAProxy?
leemalmacOP 10 years ago

I don't know. But I'll figure it out (I hope).
- doozler 10 years ago
  
  Keep us up to date!

siscia 10 years ago

I will look into Kubernetes.

I believe you will need just a very thin layer on top of it to add droplets and you can call it done.

hackerboos 10 years ago

Cloud66 and Tumtum (now Docker cloud) works with DO. Not open source and not free but a solution nonetheless.

davidhariri 10 years ago

Need as well. Go for it! Make it open source and paid for turn key! I'd pay for that

iqonik 10 years ago

I would pay for this - email in my profile, let me know when you launch.

nanocom 10 years ago

Need it too!!!

tobltobs 10 years ago

I do not believe that a price difference of 100% would be enough to make such a project interesting. Why not build something like this for dedicated boxes from hetzner, ovh ...

I don't have a clue how much harder this would be, but a price difference of about 300% looks much more interesting.

tobltobs 10 years ago

Could those downvoters explain their downvotes please? He asked for an opinion.
Seriously, building a AWS clone on DO infrastructure is just not possible. Even if it would be it would actually be more expensive in the end as AWS for small to medium projects. If you using AWS you do not only pay for one virtual server, you are paying for a infrastructure. You would have to replicate parts of this infrastructure, which would require a few separate servers and you couldn't share those costs with other users.
Using DO instead of dedicated servers for this would be like building a Uber competitor using the Uber API.
Edited: I overlooked the fact that at least neither Hetzner nor OVH do provide a useful (for this project) way to order a dedicated server via the API. They do offer an Order API, but the spin up time is too long.
vidarh 10 years ago

I used Hetzner and OVH a lot, but I think DO is much easier from an API / speed to spin up instance perspective.
That said, one of the reasons to use providers like Hetzner and OVH is that as I've pointed out elsewhere, the price difference is so large that most people who "need" auto-scaling at places like AWS would pay less if they just ordered 2x to 3x as much capacity at a place like Hetzner and left it on 24/7.
Very few people have loads that are genuinely spiky enough to save enough from auto-scaling to make up for the massive cost difference.
leemalmacOP 10 years ago

Bare metal providers like Hetzner are different beasts. In my opinion, you use bare metal when your server load doesn't change much in time. It's too hard to scale fast with bare metal providers, because they need up to 2-3(not always) days to setup server. Virtual machines are managed more simply.

Settings

Autoscaling for Digital Ocean?

Keyboard Shortcuts