Settings

Theme

Error 404 (Not Found)

spotify.com

400 points by fairytale 4 years ago · 196 comments (195 loaded)

Reader

cronix 4 years ago

back up as of now (10:10a PDT)

This sure makes it easy to know who is hosted by google by going to downdetector.com.

  • judge2020 4 years ago

    Not exactly hard to do that in the first place

      whois $(dig +short spotify.com A)
      
      NetName:        GOOGLE-CLOUD
    • uvdn7 4 years ago

      That only says spotify.com is using GCP's DNS but not necessarily for hosting?

      EDIT: I was wrong, as pointed out by shizcakes.

      • shizcakes 4 years ago

        That's not what that says. It's doing a whois on the hosting IP address, not the DNS service.

      • uvdn7 4 years ago

        While we are at it, looks like spotify.com is using a mix of NS1 and GCP for DNS.

        spotify.com. 172800 IN NS dns1.p07.nsone.net.

        spotify.com. 172800 IN NS dns2.p07.nsone.net.

        spotify.com. 172800 IN NS dns3.p07.nsone.net.

        spotify.com. 172800 IN NS dns4.p07.nsone.net.

        spotify.com. 172800 IN NS ns-cloud-a1.googledomains.com.

        spotify.com. 172800 IN NS ns-cloud-a2.googledomains.com.

        spotify.com. 172800 IN NS ns-cloud-a3.googledomains.com.

        spotify.com. 172800 IN NS ns-cloud-a4.googledomains.com.

        • account42 4 years ago

          How can you name your service NS1 but then not use the ns1. subdomain for the first nameserver. SMH

  • crdrost 4 years ago

    Yeah, per the status page, there's a temporary mitigation rolled out while the team tries to figure out more... so no more 404s but load balancers will be locked down for a while while investigation continues.

  • jonnylangefeld 4 years ago

    Hmm it shows the same spike for instagram and aws as well. Would be funny to me if something on their end depends on GCP. https://downdetector.com/status/aws-amazon-web-services/ https://downdetector.com/status/instagram/

  • fairytaleOP 4 years ago

    Can confirm. https://spotify.com is now working fine.

    • corobo 4 years ago

      Can unconfirm. I can see their website but the app isn't playing nicely at all

      Any non-offline playlists/songs just sit there not playing or telling me I'm offline

      edit @ 37min: App also seems to be working again now

  • technologyvault 4 years ago

    BigCommerce is likely one of those. Our BigCommerce store was down for several hours today.

  • jakub_g 4 years ago

    Github graphql API seems to be throwing error responses for me still

terramex 4 years ago

Etsy is 404 too: https://www.etsy.com

Seems to be a bigger issue.

edit: Nest is down too: http://nest.com

Fitbit.com is 404 too: https://www.fitbit.com

Big GCP issue?

edit2: Downdetector.com shows multiple website and services as down, including Pokemon GO or Rocket League.

GCP status page is still green all over the board: https://status.cloud.google.com

19:10 CET update: Some websites are coming back, including spotify.com, but their app still does not work for me.

information about outage just added to GCP status page, direct link: https://status.cloud.google.com/incidents/6PM5mNd43NbMqjCZ5R...

Description: We are experiencing an issue with Cloud Networking beginning at Tuesday, 2021-11-16 09:53 US/Pacific.

Our engineering team continues to investigate the issue.

We will provide an update by Tuesday, 2021-11-16 10:40 US/Pacific with current details.

We apologize to all who are affected by the disruption.

19:20 CET update:

Description: We believe the issue with Cloud Networking is partially resolved.

Customers will be unable to apply changes to their load balancers until the issue is fully resolved.

We do not have an ETA for full resolution at this point.

We will provide an update by Tuesday, 2021-11-16 11:28 US/Pacific with current details.

Spotify desktop app still not working for me.

19:45 CET: Spotify app is back online for me.

jaredeklee13 4 years ago

Global: Experiencing Issue with Cloud networking

Incident began at 2021-11-16 10:10 (all times are US/Pacific).

https://status.cloud.google.com/incidents/6PM5mNd43NbMqjCZ5R...

mixedbit 4 years ago

Looks like perhaps an issue with Google Load Balancer. We have a load balancer in front of Google Storage Buckets, and can access resources directly from the buckets, but getting 404 when going through the load balancer.

  • te_chris 4 years ago

    Yep, it's the GLB. Went down for us at 5:46 GMT. Just responding 404 and logs reporting an internal error.

  • neom 4 years ago

    Non-engineer here - Is there an easy way to multi-provider redundancy around this? Can you have LBs on multiple clouds and use dns to move around or something? Or does your LB have to be at the provider the app is at? Sorry if this makes no sense. :o

    • sparrc 4 years ago

      yes it's 100% technically possible, the main issue is it would be significantly more expensive.

  • htrp 4 years ago

    I can confirm this on my side too

ksajadi 4 years ago

Half the internet is down because of a Google Cloud global issue on their load balancers, including Spotify and Etsy and GCP status is all green: https://status.cloud.google.com If you ever wondered why GCP is a distant third runner in the enterprise cloud space, here is your answer.

  • yuliyp 4 years ago

    Expecting an instant public post is a bit unrealistic. They had a post up just over 20 minutes after the incident start, which is not that crazy, given that they needed time to triage all of the alarms and understand which component was actually breaking and confirm some technically correct information around it, even if the actual internal incident response can run without the public post.

    • corobo 4 years ago

      It's Google though.. they stop automating everything?

      At least make the screen not show all green or something automatically

      • yuliyp 4 years ago

        Which things should not be green?

        If the automation is working the services will be up. When an incident is happening it's because something is significantly broken, and automation won't properly understand what is and is not working.

        For instance, lots of follow-on alarms might be firing for what are not actually issues with the things being monitored: As an example, I would imagine that datacenter temperatures and fan speeds dropped due to the incident, which might cause automation to suspect a facilities issue, but announcing a facilities issue would be misleading.

        Or metrics around instances live might be tanking as autoscaling groups start downsizing. This would not be an issue with the autoscaling service, and automatically announcing an autoscaling outage would again be misleading.

        In an incident, taking the available data and reaching a conclusion about what is broken and what are effects is something which requires skilled manual effort and is error-prone.

        • corobo 4 years ago

          > Which things should not be green?

          The broken ones is how I usually do it.

          The automation doesn't need to do that, it doesn't need to analyse the situation. It needs to communicate "Hey. Our systems have seen this and have pinged humans, bear with" rather than "nope even though half the internet is down rn, it's all good baby"

          Make a green tick a blue questionmark or something. It doesn't even need to admit fault, it just needs to not be useless. My goal visiting the page is to get a link I can send clients "Updates will be posted here". Nothing more.

          Also if you're hosting your monitoring system on the same system it's monitoring you've just completely missed the point. At least use a different region within your cloud provider, better would be completely different provider. I'd even go as far as using different domains/TLDs to host the page if I was Google sized

        • gowld 4 years ago

          Monitoring should be on a different system, unaffected by an outage in the monitored system.

          • yuliyp 4 years ago

            I think that's tangential to my point. The concerns in my post you replied to about system interdependence making it hard for a monitoring system to separate cause and effect, even if that monitoring system is itself working properly.

  • yupper32 4 years ago

    The big three cloud platforms all have this issue of delayed status updates. Why do you think it's just GCP?

    • mostdataisnice 4 years ago

      ...and there's a reason for that. Automations to update the status page are rarely acceptable, since the status page statuses have legal and financial implications. Therefore, the IM usually has to update it (or tell someone to update it). But, realistically, when you get paged, you first need to figure out what exactly is wrong and at least a vague idea of why. Then, you need to tell someone to update the page. Then, it gets updated.

      The status page will always lag the outage. It's not a conspiracy.

      • deathanatos 4 years ago

        Status pages should be driven that way, though. "legal and financial" implications and "It's not a conspiracy" is a poor excuse.

        Now, I'm on Azure, but it seems like from the comments the situations are similar. So, instead of an automatically updated status page that would help engineers do their jobs, we get a status page that isn't accurate, and customers have pull teeth to get a service credit where/when one is due. And it seems like you can have the cake and eat it too here: while IANAL, a footnote in the SLA or the status page that "this is a machine estimate and not reflective of what goes into the SLA" should do it, no?

      • gowld 4 years ago

        Not updating the status page, to avoid the legal and financial implications, is fraud -- taking money on false pretenses.

        • yuliyp 4 years ago

          fraud? how? what guarantees do they make about timeliness of status updates on their services?

      • kazen44 4 years ago

        Also, in most teams, people who do external communication are different from those doing triage and troubleshooting.

        • cheeze 4 years ago

          Yeah, but they are still people who are responding to a page, working on wording and getting it approved, and then updating.

          20 minutes seems pretty reasonable to me.

  • brown9-2 4 years ago

    AWS typically has the same issue

  • deathanatos 4 years ago

    Azure has the same issues with updating their status page. Sometimes it never happens.

    I might at least hold out some chance that Google Cloud will write an interesting PM, which is something Azure would never do IME.

  • iso1631 4 years ago

    I'm old enough to remember the old claim that the internet was designed to cope with a nuclear attack.

MarcScott 4 years ago

Ironically it appears that IsItDownRightNow? is also down, although that could because they're experiencing what is basically going to be the equivalent of a DDOS.

fsflover 4 years ago

Time to switch to https://bandcamp.com.

  • casi18 4 years ago
    • dymk 4 years ago

      Cool, how do I stream the newest Taylor Swift album on that?

      • benbristow 4 years ago

        You write the domain (ipfs.io) on a blackboard with a chalk pen then take your nails and scratch the board.

        (jk.)

      • iso1631 4 years ago

        My colleague (who loves Taylor Swift) bought the mp3s from somewhere (amazon?) and uploaded them onto her plex server the hour it came out.

        That server continues to work just fine.

        • cheeze 4 years ago

          As a heavy plex user, I can't imagine using it as my default music player. CX isn't great for music, IMO.

cinericius 4 years ago

I wonder if a legal discovery will ever find internal status dashboards that reflect reality rather than fictitious SLA liability-aware status pages.

  • paxys 4 years ago

    You don't need legal discovery for that. Every "X as a service" contract you sign will explicitly say that SLAs aren't dependent on dashboards/ping tests but rather a mostly subjective measure of "availability".

  • hmrr 4 years ago

    Your cynicism is justified and clearly based on the same experience I have :)

algorithm_dk 4 years ago

This is clearly the hottest thing on HN right now, and it was bumped from #1 to #6, anyone knows why? Is it some kind of bot protection?

  • floatingatoll 4 years ago

    User flags, because outages are a fact of everyday life.

    • mbesto 4 years ago

      Which is dumb, linking to status pages shouldn't be on HN. A blog that has analysis and explanations of outages or post mortems should.

      knock-knock dang

      • floatingatoll 4 years ago

        Dang doesn't see messages like that unless you use the footer Contact link, but I remember a comment from him a while back that I would summarize as "Some site users think it's a good use of HN, and other site users disagree and flag it, and we downweight/dedupe them sometimes and/or if someone emails us with the Contact link". I just didn't want you to wait for a reply that'll never come unless you Contact them.

        • mbesto 4 years ago

          hehe I know, was just saying more for fun, but I appreciate the comment none the less.

deforciant 4 years ago

https://linear.app/ is also down

uubk 4 years ago

We found extra rules in our GCLB routing config - removing them restored our service.

al_james 4 years ago

Netlify is also failing for us, and reporting bad TLS certs. Not sure if they use GCP https://www.netlifystatus.com/

jacobkg 4 years ago

We bypassed our Google load balancer and pointed DNS directly at the IP addresses of our servers and that seems to have helped

humanistbot 4 years ago

The title of the 404 page on all the down sites has an extra "1" after the exclamation points: "Error 404 (Not Found)!!1"

humanistbot 4 years ago

Sites that are down according to https://downdetector.com include Spotify, Discord, Snapchat, Etsy, Pokemon Go, Epic Games, Target, Paramount+, Evernote

jaredeklee13 4 years ago

Global: Experiencing Issue with Cloud networking Incident began at 2021-11-16 10:10 (all times are US/Pacific). https://status.cloud.google.com/incidents/6PM5mNd43NbMqjCZ5R...

deberon 4 years ago

GCP outage? Status page shows green but a bunch of sites seem down (Rocket League most importantly).

profmonocle 4 years ago

I wondered why our alerts started going nuts. Seems like basically every global Google Cloud load balancer went down. Doesn't seem to affect single-region network load balancers.

Edit: All of ours are back up. Some other services still seem down though.

soheil 4 years ago

Funny thing is when you google Home Depot or Paramount Plus you get ads served by Google as the first result. When click on it Google then shows you a 404 page. I wonder if they'll get a refund on their Adwords campaign.

  • makecheck 4 years ago

    One of my pet peeves with so many services! Their obnoxious pre-ads can play flawlessly (stealing your time/eyeballs and giving them benefit), and they can still fail to give you the content you exchanged your time/eyeballs to see. Worse, they can repeatedly fail and repeatedly drill the same ads into your brain.

    There ought to be a law that essentially says if ads are “paying” for content, there must be a flawless link between ads and content such that the system can tell if the content is available (or detect after the fact that something was not delivered properly). And then, based on that, it either is required to ensure the ad never plays (since the content cannot be delivered), or that the user must be compensated in some way (e.g. we see you were forced to see an ad but got nothing so we are crediting $1 to your account).

  • progbits 4 years ago

    Why? That's not part of the ad contract. They will get refunded for GCP if it goes out of SLO.

soheil 4 years ago

I also wonder how many companies didn't want to admit they were using Google for their infrastructure. Downdetector shows AWS being affected, it'd be embarrassing if they were caught using Google Cloud Platform.

  • grumple 4 years ago

    Seriously doubt that AWS, Facebook are using Google for infra. There's probably some other effect at play, like people using a Google service to connect to these things. Also don't see any effects on those services personally.

artembugara 4 years ago

ok, so our API is down. We're on GCP...

https://api.newscatcherapi.com/v2/search

collinmanderson 4 years ago

See also https://news.ycombinator.com/item?id=29243740

markbnj 4 years ago

We were down. Just came back. Things seem to be resolving.

IceWreck 4 years ago

https://www.navidrome.org/

Self hosted Spotify. Compatible with subsonic clients.

  • NaughtyShiba 4 years ago

    On one hand, Spotify is much cheaper, on other, perhaps artistd gets paid more (assuming you acquire music legaly)

caffeinated_me 4 years ago

My company was seeing GCP Airflow environments not responding, but they seem to have recovered in the past few minutes.

johanam 4 years ago

https://overleaf.com/ is also 404 now

hs86 4 years ago

https://toggl.com/ is also affected.

dustinmoris 4 years ago

I have a few services running from the same GKE cluster, same ingress controller, same nodes, same GLB, same everything.

Some are 404ing at the moment and others work just fine. Feels like a GLB issue.

Nothing in my GCP dashboard seems to be aware of the issue however.

Only reason I found out is because I use an external service to ping me if a site is down.

trillic 4 years ago

https://www.windy.com down, same issue.

arjan_sch 4 years ago

The 404's changed into 502's.. I guess that's progress. Fingers crossed it's back up soon

Borrible 4 years ago

Yes, but we're still on DEFCON 5.

contrahax 4 years ago

Seeing the same - I have projects in us-east1 that went offline first, then us-west1 went offline a few minutes after. Everything green on their status page and nothing in the dashboard - everything returns a 404 so I'm assuming a really high level LB just took a dump.

  • contrahax 4 years ago

    Seems to have just resolved itself in us-east1 so I'm hoping us-west1 follows a few minutes after.

kadomony 4 years ago

I don’t understand why people post website outages.

Do you think the DevOps teams at these billion dollar streaming companies are so clueless that they don’t have monitoring in place?

Do you think that people who go to a site when it’s down don’t see the same thing?

So whose awareness does this serve?

  • blamazon 4 years ago

    In general, people post things on HN to discuss them. This includes high profile web outages.

mcintyre1994 4 years ago

Looks like it's made it to the Google status page: https://status.cloud.google.com/incidents/6PM5mNd43NbMqjCZ5R...

1cvmask 4 years ago

Is it regional? Surprised Spotify is not active-active on other platforms like AWS and Azure.

dave_aiello 4 years ago

Right now homedepot.com and the APIs that drive their mobile app are down too.

mkl95 4 years ago

Not Google's best week.

iampliny 4 years ago

Might be Google Cloud outage: https://news.ycombinator.com/item?id=29243753

cglace 4 years ago

Everything seems to be working on our end as of 1:08 PM EST.

te_chris 4 years ago

Affecting us. Busiest time of the year and now down 20 min. It's the Global Load Balancer, so god knows what bit of the global edge has been taken out.

nagisa 4 years ago

One of the websites I've noticed this on is back up.

timdaub 4 years ago

- discord doesn't allow me to connect either.

_nickwhite 4 years ago

1:10PM - either it has resolved itself, or a regional issue, but I'm not seeing anything being down from the East coast USA.

lukeschlather 4 years ago

As far as I can tell everything is up it's just that our load balancers aren't routing traffic and just returning 404s.

gassius 4 years ago

Funny enough, datadog, which I was using to investigate on of my vercel sites, is down too

Yeah, Vercel is running some GCP services it seems

  • jeffbee 4 years ago

    Are you able to access your data in the AWS-hosted datadog instance?

    • gassius 4 years ago

      Well, is not like I know how to switch to it, but Datadog came back for me, probably because of that

pdenton 4 years ago

I knew Google would one day take control of the web, just thought they'd have a more clever way of doing it.

DiFronzo 4 years ago

Oh okay, thought something was on my end.

Jansin3 4 years ago

Rocket league perhaps epic games even

scame-miv 4 years ago

I was experienced this issue with my spotify app. Initially thought, it was my internet issue lol.

etimberg 4 years ago

Seeing this across the board with providers on GCP. Firestore however does not appear to be down

pcbro141 4 years ago

https://downdetector.com/

yup

authed 4 years ago

That's why I like the clouds.

cdiddy2 4 years ago

Discord down

Jugurtha 4 years ago

My stuff is running OK on GCP, with GKE usage. Maybe it's related to nameservers?

  • deforciant 4 years ago

    if you are using regional load balancers or serving traffic directly from nodes then you would be fine :) "only" global LB failed

addcninblue 4 years ago

It looks like everything is back now. That was a short outage by recent standards...

thegranderson 4 years ago

Seems like another DNS issue - switching to Cloudflare 1.1.1.1 got me back online...

oussama-gmd 4 years ago

Seems to be an issue with GOOGLE cloud load balancer. our website is down too

Wingy 4 years ago

Seems to be coming back up.

lenniez 4 years ago

Also experience that with all my GCP related infrastructure (Europe)

mey 4 years ago

Our infra in us-central1 behind gcp lb is impacted but not us-west1

davidkuennen 4 years ago

My app was down too. Can confirm it is most likely the GCloud LB

saranshk 4 years ago

Our instances started working again, so seems to be fixed

icecoldfire 4 years ago

Pokémon GO also down

oussama-gmd 4 years ago

Same here, seems to be an issue with load balancers

nicebill8 4 years ago

GCP - my Cloud Run containers are giving 500's

plg 4 years ago

overleaf.com also

crackercrews 4 years ago

CBS.com is down, and some NYT pages as well.

christophclarke 4 years ago

Snapchat also having issues refreshing

melling 4 years ago

I noticed Discord went down for me.

sabbakeynejad 4 years ago

veed.io is down too! Same problem

ishikawa 4 years ago

it was back in a few minutes. But that wasw pretty weird. App Engine was affected.

dustinmoris 4 years ago

Things are back online again!

aalbertson 4 years ago

Lowes search was also down.

dyeje 4 years ago

Discord seems to be down.

ukd1 4 years ago

We're also affected.

NaughtyShiba 4 years ago

Seems to be back already

dfxt8 4 years ago

Discord is down too.

cronix 4 years ago

Seems to be fixed.

milesward 4 years ago

It's back up

Thaxll 4 years ago

It's back!

colewilson 4 years ago

looks like it's back up?

ruined 4 years ago

nice

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection