Settings

Theme

Hello admins of EU AWS systems – DNS outage in eu-west-1

9 points by danielhunt 9 years ago · 16 comments · 1 min read


https://status.aws.amazon.com

Looks like there's a DNS outage in AWS Dublin

All DNS request coming from `eu-west-1b` fail completely if you're using the default AWS DNS resolution itself - if you switch to `8.8.8.8` it's all totally fine, but that's of no use if you have private services that need internal AWS resolution

`1a` and `1c` are also affected, but to a much lower extent from what I can tell

aidos 9 years ago

Question for the more experienced in the group - is there a way of caching the DNS locally within you VPC in case something like this happens again? Or does that just cause more issues than it solves?

When I discovered it I tried switching my resolv.conf to 8.8.8.8 but of course none of my internal stuff worked because of how my security group / subnet / ip restrictions are setup internally.

  • jlgaddis 9 years ago

    I'm not a big user of AWS but I don't know that there's anything stopping you from running your own recursive resolver (in your VPC) that your other machines use (instead of using an external resolver or whatever Amazon provides).

    You could, for instance, stand up an instance or two running Unbound, forward to Amazon's own resolvers for internal stuff, and forward to external resolvers for external resolution (or just handle it yourself instead of using forwarders).

    • tomekit 9 years ago

      The main reason to use AWS is so you don't need to do anything except calculate profit from your app, at least in theory. If you need to set up own resolvers or anything else within the AWS there is no reason to use AWS in the first place.

  • chatmasta 9 years ago

    Yes, you can specify DNS server to use in the DHCP options set of the VPC. I've done this to point to internal DNS servers.

fern4lvarez 9 years ago

It's weird, I see actually most of my issues coming from instances located in `eu-west-1a`, whereas `eu-west-1b` and `eu-west-1c` look fine.

  • aidos 9 years ago

    I recall reading somewhere many years ago that AWS randomise a, b, c etc per account. Otherwise people have a tendency to pick A and be done with it.

    • danielhuntOP 9 years ago

      That's true - they loadbalance instance creation unless you specifically ask for an instance in a zone

      That doesn't apply here though. Instances that are already up are experiencing DNS resolution issues because the Amazon-provided DNS service (which is distinct from Route53) is failing

      • aidos 9 years ago

        I don't think I explained that very clearly.

        What you see as zone A is not what I see as zone A (maybe). When you sign up for an account, AWS assigns your zone A to one of the 3 available zones:

            zone X (real) : your A (virtual) : my C (virtual)
            zone Y (real) : your B (virtual) : my B (virtual)
            zone Z (real) : your C (virtual) : my A (virtual)
        
        That's why people see different outage characteristics between the zones.

        Edit: one of the few articles (other than HN comments) that explains it https://alestic.com/2009/07/ec2-availability-zones/

        • danielhuntOP 9 years ago

          I've been using AWS for years and this is the first I've ever heard of this.

          All I can say is "WHAAAAAAT?!"

          You've made me question my reality. Take an upvote for that alone.

          • misframer 9 years ago

            It's in the documentation [0].

                An Availability Zone is represented by a region code followed by a letter identifier;
                for example, us-east-1a. To ensure that resources are distributed across the Availability
                Zones for a region, we independently map Availability Zones to identifiers for each
                account. For example, your Availability Zone us-east-1a might not be the same location
                as us-east-1a for another account. There's no way for you to coordinate Availability
                Zones between accounts.
            
            
            [0] http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-reg...
          • aidos 9 years ago

            I'm glad I could make what is otherwise a bad time to be looking after servers a little more enjoyable.

            I myself was in the middle of a release when all this happened. "I'll just release this thing" I said, "will only take 20 minutes" I said....

  • danielhuntOP 9 years ago

    Interesting difference in experience there

    Complicated networks are complicated, I suppose

  • danielhuntOP 9 years ago

    Seeing an increase in errors in `1a` and `1c` now myself too

neo2001 9 years ago

6:29 PM PST We have identified the root cause of the DNS resolution issues in the EU-WEST-1 Region and continue working towards resolution.

paugay 9 years ago

same here :) let's see how long it takes

neo2001 9 years ago

sigh

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection