Settings

Theme

AWS Glue/Athena/Redshift outage

status.aws.amazon.com

82 points by reinhardt 6 years ago · 16 comments

Reader

nnx 6 years ago

Should edit title to add “in us-east-1”. Other regions are unimpacted.

  • thinkingkong 6 years ago

    There are so many outages in us-east-1. I've heard the reason is because that's where they roll out maintenance first or something along those lines. Just look at this list of outages on Wikipedia [1] and scan for US-east-1, North Virginia, or "Northeast" (all the same places).

    Just don't use US-EAST-1 as your region.

    1. https://en.wikipedia.org/wiki/Timeline_of_Amazon_Web_Service...

    • discodave 6 years ago

      It's the oldest region, which means:

      * It's the largest region (ever had an unexpected scaling bug?).

      * It has more legacy stuff lying around. For example, old regions have EC2 Classic, while new regions are VPC only.

      * There are more customers there. More whales, more use cases.

      Most AWS teams explicitly try not to deploy to us-east-1 first, but because us-east-1 is so different on so many dimensions, it is more likely to have issues that dont manifest elsewhere.

      (Source: An AWS Engineer)

    • Johnny555 6 years ago

      I've heard the reason is because that's where they roll out maintenance first

      That doesn't make sense - why would they do maintenance in their largest (and oldest) region first? I'd expect them to roll out changes to smaller regions first so problems will affect fewer users.

      I think the more likely explanation is that it's their largest (and oldest) region.

      • swasheck 6 years ago

        an aws tam once told me the same thing. us-east-1a gets the new stuff first. i never validated it against anything other than this one person's statement.

        • ecnahc515 6 years ago

          "1a" in this context means nothing. The AZ assignments each account gets is random. us-east-1a is probably a different data center for you than me.

    • empath75 6 years ago

      It’s also full of legacy infrastructure since it was the first region.

    • chocolatkey 6 years ago

      It is usually the cheapest region though. Maybe this is why

      • jermops 6 years ago

        Source? I see price parity across us-east-* and us-west-2 for every service i've looked at.

      • karavelov 6 years ago

        It's the biggest region, if it breaks it breaks in us-east-1.

    • teej 6 years ago

      Redshift changes roll out in us-east-1 after other regions though, so I imagine the root cause is something else.

wgjordan 6 years ago

Summary:

> Between 9:21 AM and 2:36 PM PDT we experienced increased query failures and latency in the US-EAST-1 Region. The issue has been resolved and the service is operating normally.

> The issue with the Data Catalog APIs started with a software update in the US-EAST-1 Region that completed at 9:21 AM PDT. The software update was immediately rolled back[...]

nknealk 6 years ago

Thankfully the redshift outage was just on APIs, not existing machines. Our cluster was fine today, but external schema which rely on glue/athena did time out.

nullwasamistake 6 years ago

Cloud services go down more often than my old WordPress sites. Avoiding vendor lock in and doing multi-provider deployments should be par the course.

kache_ 6 years ago

Failovers, man.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection