Settings

Theme

Ask HN: Are you using AWS ECS in production?

9 points by skyisblue 8 years ago · 10 comments · 1 min read


We're thinking of migrating to ECS and wondering what the state of it is currently.

Are there still issues with agents disconnecting?

Should we not bother and go straight to kubernetes?

bdcravens 8 years ago

We are. Agent seems pretty solid. Biggest issue I've seen is when doing a new deploy, sometimes old tasks keep running.

Biggest gotcha: tasks restarting over and over because of bad load balancer config on my part (for instance, using 200 as status code when the healthcheck endpoint returns a 302)

Some of what won me over:

* IAM role integration at both instance and task level

* ecs-cli can use docker-compose.yml (with minor revision)

* easy use of spot fleets

* cron support for tasks

* easy to script in control of clusters into your app with AWS SDK

I evaluated Kubernetes, and may give it another look soon, but ECS was pretty easy to get going.

NathanKP 8 years ago

I'm currently a developer advocate for ECS at AWS, so I'm pro ECS as you'd expect. But before I worked at AWS I used ECS in production (since the early beta).

At the time we ran a microservices deployment of ~15 services on ~20 hosts. ECS made orchestrating the services easy for a couple reasons:

Unlike with self managed Kubernetes on AWS we could have high availability with just a simple cluster that just had two machines. Running the Kubernetes control plane high availability requires a lot of setup, and while there are tools like kops that are helping out with setup now its still a lot of extra administration. (See https://kubernetes.io/docs/admin/high-availability/) The advantage of ECS here is that you just start two or three instances in different availability zones that run an agent and that is all it takes to have high availability. You don't have to pay anything extra for the control plane resources, or worry about monitoring it or maintaining it.

Also AWS ECS integrates really well with all the other AWS services. For example metrics from your services automatically get piped to CloudWatch, where you can set up an alarm that triggers a Lambda function, or publishes to an SNS topic that triggers a Pagerduty notificaton. Or you can use the metrics to make a CloudWatch Dashboard for creating a custom overview of your cluster. Logs likewise go to CloudWatch where you can setup triggers that execute a Lambda function. You can give each service its own IAM role to control which resources (DynamoDB tables, S3 buckets, etc) that specific service has access to. ECS integrates really well with Application Load Balancer, which allows you to easily setup a mixed architecture, where some traffic is routed to services that are running as containers under ECS, and other traffic is served by older applications running directly on hosts with no container.

If you are looking for more info as you evaluate whether or not AWS ECS is right for you please check out this list of ECS resources, most of which are created by the developer community: https://github.com/nathanpeck/awesome-ecs

And feel free to reach out using the Twitter handle or email on my profile if you have any questions or feedback on ECS.

mmontagna9 8 years ago

We've experienced agent crashes in the past but those seem to have been resolved now. Occasionally we will find a docker container from an old task which is still running, but about which ECS knows nothing. Definitely can make for an interesting troubleshooting adventure.

And it seems like maybe the ECS team is trying to move a little too fast recently. They released this blog which claims the run-task api supports several new override parameters but the backend still doesn't actually do anything with them it just silently ignores them.

https://aws.amazon.com/about-aws/whats-new/2017/06/amazon-ec... https://github.com/boto/boto3/issues/1184

  • cpufry 8 years ago

    i think they were talking about the cli, cause that's what they link to from the blog post. they should be clearer though.

Sevii 8 years ago

Don't use it. It's crapware. We have 100+ hosts. Problems include the scheduler assigning tasks to nodes that report their ecs agent as crashed. Just use kubernetes it's going to be more stable and have more support. I wish execs didn't take aws seriously when they promised features 6 months down the line.

  • NathanKP 8 years ago

    I'd love to hear more about this problem and see if we can get to a root cause and help you resolve it, because it does not sound like standard ECS behavior. Please email peckn@amazon.com and I can connect you to the right people to figure out what is going on.

  • Sevii 8 years ago

    ECS doesn't stop trying to schedule tasks ever. So you can ddos and crash your entire cluster if one if your containers fails on startup.

    • Sevii 8 years ago

      Issue is still open on GitHub, you can easily blow through your entire iops budget and have cpu pegged at 95% iowait.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection