Heroku Seems to Be Down
x.comA Heroku app of mine went down starting at ~7:43pm Eastern US (I KNOW, I need to migrate it elsewhere), and in a separate, weird "thing is down" instance, some pages in the Epic healthcare app seem to be offline. Also seeing that YouTube issue.
Related comments on reddit:
* https://www.reddit.com/r/Heroku/comments/1r7o1hk/outage/ * https://www.reddit.com/r/youtube/comments/1r7onh1/youtube_we...
The fact that your database is reachable but the app isn't after restart points to the routing layer being toast. If you can ssh into the dyno or check logs, look for anything related to the router handshake timing out. Usually when deploys work but traffic doesn't route, it's something between the load balancer and your instances.
The correlation with scheduled restarts is interesting though. Makes me wonder if there's a cert validation issue on boot that's causing new instances to fail health checks.
I've been using DigitalOcean's App Platform as an alternative. It's not identical, but it's really close with the git-base deployments, etc. Anyone else can share their thoughts too.
Perhaps related: on Downdetector right now I see outages reported in the past few minutes for GCP, Cloudflare, and AWS. Youtube is also down.
Unusual to see all three blowing up at the same time.
One of my Heroku apps is down, and seems to have gone down immediately after a scheduled daily restart. The others all work, and I can still access my database directly. Kinda fun trying to reverse engineer what's going on with the limited information we have!
The status page lag is the worst part. Hard to debug when you're questioning whether it's your config or their infrastructure. At least when multiple providers go down at once you know it's not just you going insane.
Nice of Heroku/salesforce to join the party - an hour late: https://status.salesforce.com/incidents/20003708
Love the back dating like it has been there the whole time
As the other commenter said, pretty sure it's at the routing layer. Deploys, start ups, all working just fine for our apps. But router can't reach them. Related to the Google CA outage maybe?
Yea, we have several applications on heroku. Only failures are the ones that had platform initiated restarts after 5pm MST. Likely tried to get a new cert on restart and then fails. Don't deploy or do manual restarts right now.
Looks like we had platform initiated restarts that ended up taking down the whole app. Yay!
Not sure if this has any relation but Youtube front page since a couple minutes is void of video's, just top menu and a black page, and the recommended videos sidebar is completely empty too.
No sign on the status page. But two separate / unrelated accounts I know about are fully down right now. Data services / backend workers seem fine, web/routing layer seems to be dead.
The routing layer being down while workers/data services stay up is such a specific failure mode. Usually means the load balancers or edge routing got corrupted somehow, not the actual compute infrastructure.
If you're serious about migrating off (and not just saying it in the heat of the moment), the main thing is having a plan for the database migration. That's always the painful part. Everything else is just Docker containers that run anywhere.