Redshift outage

status.aws.amazon.com

82 points by reinhardt 7 years ago · 16 comments

Reader

nnx 7 years ago

Should edit title to add “in us-east-1”. Other regions are unimpacted.

thinkingkong 7 years ago

There are so many outages in us-east-1. I've heard the reason is because that's where they roll out maintenance first or something along those lines. Just look at this list of outages on Wikipedia [1] and scan for US-east-1, North Virginia, or "Northeast" (all the same places).
Just don't use US-EAST-1 as your region.
1. https://en.wikipedia.org/wiki/Timeline_of_Amazon_Web_Service...
- discodave 7 years ago
  
  It's the oldest region, which means:
  * It's the largest region (ever had an unexpected scaling bug?).
  * It has more legacy stuff lying around. For example, old regions have EC2 Classic, while new regions are VPC only.
  * There are more customers there. More whales, more use cases.
  Most AWS teams explicitly try not to deploy to us-east-1 first, but because us-east-1 is so different on so many dimensions, it is more likely to have issues that dont manifest elsewhere.
  (Source: An AWS Engineer)
- Johnny555 7 years ago
  
  I've heard the reason is because that's where they roll out maintenance first
  That doesn't make sense - why would they do maintenance in their largest (and oldest) region first? I'd expect them to roll out changes to smaller regions first so problems will affect fewer users.
  I think the more likely explanation is that it's their largest (and oldest) region.
  - swasheck 7 years ago
    
    an aws tam once told me the same thing. us-east-1a gets the new stuff first. i never validated it against anything other than this one person's statement.
    
    ecnahc515 7 years ago
    
    "1a" in this context means nothing. The AZ assignments each account gets is random. us-east-1a is probably a different data center for you than me.
- empath75 7 years ago
  
  It’s also full of legacy infrastructure since it was the first region.
- chocolatkey 7 years ago
  
  It is usually the cheapest region though. Maybe this is why
  - jermops 7 years ago
    
    Source? I see price parity across us-east-* and us-west-2 for every service i've looked at.
  - karavelov 7 years ago
    
    It's the biggest region, if it breaks it breaks in us-east-1.
- teej 7 years ago
  
  Redshift changes roll out in us-east-1 after other regions though, so I imagine the root cause is something else.

Summary:

> Between 9:21 AM and 2:36 PM PDT we experienced increased query failures and latency in the US-EAST-1 Region. The issue has been resolved and the service is operating normally.

> The issue with the Data Catalog APIs started with a software update in the US-EAST-1 Region that completed at 9:21 AM PDT. The software update was immediately rolled back[...]

nknealk 7 years ago

Thankfully the redshift outage was just on APIs, not existing machines. Our cluster was fine today, but external schema which rely on glue/athena did time out.

nullwasamistake 7 years ago

Cloud services go down more often than my old WordPress sites. Avoiding vendor lock in and doing multi-provider deployments should be par the course.

kache_ 7 years ago

Failovers, man.

Settings

AWS Glue/Athena/Redshift outage

Keyboard Shortcuts