Incident with Issues and Pull Requests

205 points by longwave 3 years ago · 142 comments

Reader

GitHub outages are very reliable, as I live in Europe and they always come in the afternoon they're a great reminder to go get lunch

It's a feature, not a bug!

agos 3 years ago

in CET they're more around afternoon coffee break time
willsmith72 3 years ago

How i miss the euro lifestyle

dbingham 3 years ago

What is going on over there? Third day in a row is... kind of impressive.

aranw 3 years ago

Lots of copilot generated code failing
- practice9 3 years ago
  
  Not only Copilot, seems like some Microsoft services like Bing AI and Bing Image Creator have some issues today as well with 4xx / 5xx, and incorrect region authorization (had to switch the account region from a European country to US to make it work again on mobile)
  - tesin 3 years ago
    
    I think you may have missed the joke - they were implying Github was using Copilot internally, causing the outages, due to poor output. Not that Copilot itself was unavailable (although that may be true, also)
iepathos 3 years ago

They blamed the march and april outages on some database query that was changed due to an infrastructure change they rolled out. I'm guessing their infrastructure change caused some other race condition issue that they are only seeing after major production failure due to not load testing enough in their staging environment https://github.blog/2023-05-03-github-availability-report-ap...
- edgyquant 3 years ago
  
  As much as I’ve been frustrated by these outrages, we’ve all been there
  - qmacro 3 years ago
    
    Now that is a good typo
    
    robofanatic 3 years ago
    
    could be on purpose
frde 3 years ago

My money is on some significant backend architecture migration gone wrong without a viable way to roll back the time machine :)
- capableweb 3 years ago
  
  Feels like an organization as big as Microsoft must surely have some sort contingency plan in place before doing such a large migration. Right?
  - isbvhodnvemrwvn 3 years ago
    
    Sales handing out Office365 discounts and trying to convince people that AWS and GCP is going to steal their data, judging by companies I worked for that used Azure.
    
    hardware2win 3 years ago
    
    Wasnt it true? Thas Amazon abused their AWS position and stole their competitors data, so thats why Germany's retail businesses are building their own Clouds
    
    jacooper 3 years ago
    
    Any links? Interested on reading more about this
    
    belmont_sup 3 years ago
    
    Here’s one
    https://www.wsj.com/articles/amazon-scooped-up-data-from-its...
    The gist is that yes it’s true. They’d come out with their own Amazon Basics branded stuff and push it to the top.
    
    jacooper 3 years ago
    
    But that's amazon, not AWS.
  - xeromal 3 years ago
    
    I worked on the volume licensing part of Microsoft years ago and deployments were stressful. They'd start at friday late like 8pm or so and go until 8am in the morning. Everyone was on a long call the entire time. I hated it.
- whynotmaybe 3 years ago
  
  At that point, I wondering if that's not me just because I updated some libraries on my local build agent.
- lallysingh 3 years ago
  
  Azure?
zamalek 3 years ago

Just another day doing DevOps for a Ruby on Rails product.
candiddevmike 3 years ago

Today seems worse than yesterday. I'm getting wildly inconsistent results when viewing repositories after a push. Hard to tell if my push actually went through, and it's not triggering actions.
- robofanatic 3 years ago
  
  exact same issue
SideburnsOfDoom 3 years ago

It comes in around 09:30 on US east coast.
I suspect that it's related to high load.
pera 3 years ago

Maybe they fired too many people this time? https://news.ycombinator.com/item?id=35334705
tonyhb 3 years ago

From an SRE, one of their DB clusters failed. They use Vitess which is great, but it can be prone to hotspots and doesn't auto-shard. Heavy usage (esp. from large customers, rogue jobs) can take down the cluster. When it goes down, it's a PITA to resolve.
- samlambert 3 years ago
  
  This literally isn't true and looks awfully like the talking points of one of our competitors.
  - tonyhb 3 years ago
    
    Ah, unbalanced shards via wrong sharding keys was an issue at one point, IIRC. I remember talking with an SRE there when something bad happened at GitHub last year, and I know that this time the current DB cluster failed.
    To be clear, I _was_ mapping previous incidents with this year's incident — no competitor or hard feelings involved. I really like Vitess, fwiw. And the only thing I really love is FoundationDB :)
    
    samlambert 3 years ago
    
    That wasn't clear.
    Side note: "Autosharding" is largely a myth that unproven databases are touting. Sharding is complex and requires planning and control. Databases that start shuffling data round without oversight produce nasty surprises. Trying to be too magic is normally always a mistake with databases.
    
    tonyhb 3 years ago
    
    Yeah, fair, totally get it. Wasn't aiming to spread FUD, and I know that FDB is a little hard to compare against... it is pretty magic with how it routes and shards :D (https://forums.foundationdb.org/t/keyspace-partitions-perfor...)
    
    tonyhb 3 years ago
    
    For posterity: https://github.blog/2023-05-16-addressing-githubs-recent-ava...
    It was the DB, and it was rogue usage on May 10, so I'm standing by my original comment
  - cheshire137 3 years ago
    
    What would you know, random Hacker News commen--oh. Hi Sam, carry on.
    
    samlambert 3 years ago
    
    <3 Hey Sarah!

buglungtung 3 years ago

I'm considering host a gitea instance backup all of my repos.

I have an important fix that need to be deployed right now but there is no way to deploy it in a normal way with our CI which one was setup with Github Action. Fortunately I have a instruction to bypass CI and build the source by myself.

But again, Github defeat me because our release workflows are depend on GitOps which are effected by Github issue. Ahhhhhhhhhh I have to build the docker image, push it to ECR then update a YAML template to make EKS apply the new changes

It's 9PM in my timezone and I'm waiting for my patches are up. A frustrating incident

galleywest200 3 years ago

Gitea's ability to create a local repository as mirror of a remote repository is great for this. You can stay on Github and have your code regularly mirrored locally.
- jmkni 3 years ago
  
  If only Git had this sort of functionality built in…
  - silverwind 3 years ago
    
    Pretty sure git can not continuously pull/push the mirrored repo like this feature does, by default every 10 minutes.
    
    edgyquant 3 years ago
    
    This is a one line cron job
    
    ijustlovemath 3 years ago
    
    Just a mounted drive over sshfs...
  - dboreham 3 years ago
    
    Gitea can run your CI actions too, and host your releases.
- sklarsa 3 years ago
  
  I have this setup running on a Synology NAS at home. I'm currently syncing all of my starred github repos to local storage using a short bash script that runs once a week. Once a repo is in gitea, it pulls any new updates from github every 6 hours or so. It's mostly for archival purposes, just in case something majorly bad happens to github.
  - pdimitar 3 years ago
    
    Would you share your setup?
    I'm interested in building the same but if yours already works well then I see no point in duplicating important work.
    
    sklarsa 3 years ago
    
    Sure!
    Here's a gist to the script: https://gist.github.com/sklarsa/845152721ee9292eb01f70756b89...
    As for gitea, I'm just hosting it using Docker and orchestrating using a simple docker-compose file that maps the gitea data directory to a Volume on the synology: https://gist.github.com/sklarsa/0dd6d6094dac6bf6e7bf61df9ca5...
    It's all hosted on my private network at home.
- buglungtung 3 years ago
  
  Or use Gitlab. I remember doing a mirror syncing from Github with one of my gitlab repository. The main reason I used to do it is Gitlab offer free built-in CI at that time In my case one of most important thing is the GitOps workflow. It's single source of truth so it's also single point of failure ;(

michaelmure 3 years ago

Daily reminder that https://github.com/MichaelMure/git-bug could use some help :-D

isaacdl 3 years ago

Unfortunately, I can't see what this is because GitHub is down.
- michaelmure 3 years ago
  
  ... of course. Mirror there: https://gitlab.com/MichaelMure/git-bug
joostlek 3 years ago

I was greeted with a "Please give us this UUID when you report this bug" twice and I thought it was a Github breaking repo. love it

Vasniktel 3 years ago

So nice to meet you all here again, gents - this is becoming a regular thing.

gtrax 3 years ago

We are back to Windows 95, which reliably crashed after 49.7 days:

https://www.cnet.com/culture/windows-may-crash-after-49-7-da...

VWWHFSfQ 3 years ago

> Codespaces is experiencing degraded performance. We are continuing to investigate.

Imagine not only not being able to push your code, but also not even being able to _write your code_ at all. And so many orgs rely on Actions to even be able to deploy. Geez. I personally believe that the cloud sucks.

gabrielgio 3 years ago

If you're putting all your eggs in one basket you are the only one to blame I guess.
It's sill weird to me how many and how much companies relly on Github infra.

zomglings 3 years ago

Not just issues and PRs - I had to try multiple times before I was able to successfully push code to a repo over SSH.

This is the error I was seeing:

    ERROR:
    fatal: Could not read from remote repository.
    
    Please make sure you have the correct access rights
    and the repository exists.

candiddevmike 3 years ago

Everyone thinking they've been laid off and notified via git push. Just GitHub being GitHub.
- izdochr 3 years ago
  
  Exactly what I thought! Maybe I'm too paranoid
  - ta988 3 years ago
    
    The first time it did that to me as well. Now I'm used to it.
timvdalen 3 years ago

Yeah, same. I've since been able to push my branch, but now I can't open a PR.

cloudking 3 years ago

Anecdotal - I've been using Gitlab for a few years on some projects, and haven't experienced any downtime issues with them of this magnitude.

dbingham 3 years ago

Yeah, Gitlab has had its share of downtime. Github has had shockingly little until the last year or two. These things come and go.
Of course, if this continues happening it may be time for the OSS community seriously consider migrating en masse.
[Edit to fix grammar - thanks for the corrections!]
- dimgl 3 years ago
  
  Apologies to be pedantic, I think it's en masse.
- tnorthcutt 3 years ago
  
  FWIW the typical phrase is "en masse" (which is French). The English translation is "in mass".
  - CharlesW 3 years ago
    
    The English translation would be something like, “all together, at the same time”. “In mass” isn’t an English idiom.
    
    krapp 3 years ago
    
    "en masse" is the term used in English as well.
    
    CharlesW 3 years ago
    
    Yep, just noting for anyone who may not realize that "in mass" isn't really a thing in English.
    As you note, en masse is common — enough so that you even don't need to italicize it (although you can).
    
    capableweb 3 years ago
    
    If you're brave enough, you can italicize whatever you want.
    
    CharlesW 3 years ago
    
    Indeed. Tips on when you should italicize foreign words in English writing: https://proofed.com/writing-tips/should-you-italicize-foreig...
jgadelange 3 years ago

I can still remember https://about.gitlab.com/blog/2017/02/10/postmortem-of-datab... (though I can't rembember reading of any big problems since then)
goodoldneon 3 years ago

Same. Overall, I like GitLab a lot more than GitHub. I wonder how much of GitHub's popularity is buoyed by its status as the defacto home for OSS
- cloudking 3 years ago
  
  It has good parity with GitHub on core functionality and a generous free tier.
PestoDiRucola 3 years ago

To be fair, Gitlab accidentally deleted 6 hours worth of data from their database some time ago.

c16 3 years ago

The temptation to create `isgithubup.com` and return "surprisingly yes." on the rare occasion it actually is up.

ta1243 3 years ago

host it on githubpages

rvz 3 years ago

Three days in a row of outages, in less than a week of unreliability after yesterday's downtime of GitHub Actions [0].

Really at this point, you just might as well consider self hosting and it is looking very chronic with GitHub falling apart and self-hosting was indeed the sensible idea just like how the other open source projects have done for years.

GitHub is going just great, and centralizing everything to GitHub really was a good idea wasn't it? [1] /s

[0] https://news.ycombinator.com/item?id=35887029

[1] https://news.ycombinator.com/item?id=22867803

ta1243 3 years ago

There's no way a company could run its own source management system and have an uptime approaching 23 hours a day -- that's like one nine of uptime!
(/s of course)

melx 3 years ago

Shout out to all devs that put the deps of their programs on GH...

Right now I can ignore that my PRs show 500 error, or old code best case scenario.

But... I cannot build and ship due to project's dependency depending on some stuff hosted on GH.

dijit 3 years ago

Which is like 99.99% of all the go code out there.
- melx 3 years ago
  
  You right! I remember I stopped Go programming when they added this "feature". (I mostly left due to syntax "overflow" tho).

samwillis 3 years ago

Hugs to the GitHub Opps and SRE teams right now!

Also hugs to any Devs, Opps or SREs directly effected by this outside GitHub.

Looking forward to a post-mortem on the last few days, I'm sure it will be a really interesting read.

jamespetercook 3 years ago

Third day in a row this has affected my productivity

booleanbetrayal 3 years ago

500's on every repo page + CLI errors on pull / push. Completely useless at the moment.

martiuk 3 years ago

Anyone experienced with asking for service credits for enterprise customers? It looks like you have to prove that they have missed their SLAs

dugmartin 3 years ago

I've had some actions queued for multiple days now on certain repos, but not others. I've cancelled them and restarted them during the green status intervals but they all go back to "Queued". I've also cancelled them and then made slight documentation tweaks to get new commit hashes on the branches and it still goes to queued.

atl4s 3 years ago

Just lost a merge commit to dev/null. This is getting tiresome

capableweb 3 years ago

How??? You do the merge, which either creates a new commit for the change, or appends the commits to your existing tree. Then you push that to the remote. If the push fails, you can just push again, it's not lost. And if the merge failed, you didn't have any merge commit to begin with.
- misnome 3 years ago
  
  There used to be a pretty consistent bug that if an on-site PR merge failed but you clicked "Retry", that it just did a basic non-squash full-merge discarding all your commit message work, often requiring a revert to tidy things up. It could be similar to that.
- edgyquant 3 years ago
  
  May have been using the GitHub interface
- snapcaster 3 years ago
  
  You can do a lot of operations on the github web UI nowadays. Could have been that

melx 3 years ago

I'm going to self-host my git repos. Any recommendations?

The git+nginx would suffice but it does not offer GUI. I need one to see the changes proposed (aka PRs).

Gitea is nice, but a bit overkill for my needs. I don't need CI, files hosting, issues, team members, releases, wiki, forking/watching/staring, etc.

justinclift 3 years ago

Gitea is pretty light on resources so even though it has a extra functionality you don't need it's not really a resource pig. Unlike say, GitLab.
- melx 3 years ago
  
  Gitea requires a database which is unwanted feature for "git server" in my little world.
  I'm just looking for "website" (read: interface) that list files over HTTPS, with the ability to show nice looking diffs. Some sort of ssh keys(?) to prevent unauthorized access etc.
  - JCWasmx86 3 years ago
    
    Maybe something like gitweb? https://git-scm.com/docs/gitweb
    I have not used it before, but it seems like it follows your requirements (Except auth using SSH keys maybe, but that could be a task for e.g. Nginx)
    
    melx 3 years ago
    
    Thanks, I discovered "git instaweb" which is based on gitweb. It's nice and offers mostly what I'm after,but the 100% layout width is terrible (no opt to configure it).
  - justinclift 3 years ago
    
    Maybe something like gitolite then?
    https://github.com/sitaramc/gitolite
    I've not used it personally though, so no idea how well it works in practise.
    
    melx 3 years ago
    
    Thanks! Gitolite doesn't offer any UI for what I see. Also to set it up is too complicated.
    I just found about "git instaweb" (developed by Git authors, but it's extra OS package) - it works locally, probably what I was looking for (minus the 100% width page layout).
    BTW There's Gitblit is someone's fancy hosting Java app.
yjftsjthsd-h 3 years ago

> going to self-host my git repos. Any recommendations?
Depending on your needs, this can be as simple as sticking repos on any server you have and cloning/pulling/pushing over ssh. If you want something more sophisticated, though, there's a handful of nice applications (gitea is being suggested further up-thread).
dboreham 3 years ago

imho Gitea is so smooth to deploy and manage that it's worthwhile even if you don't need its advanced features.

greenie_beans 3 years ago

this is the third day in a row this is a problem yet they're framing it as a new outage.

bluehatbrit 3 years ago

I'd say it's reasonable to list it as separate outages on the status page as it's really a representation of "is github available and working as expected". Even if it is the same issue, when they manage to mitigate it (or it goes away) I'd want to see that everything is now available from a user perspective.
That said, they're getting to the point where they really need to make some larger post about this. It seems reasonable to assume it is all from one root cause.

MattIPv4 3 years ago

Loading github.com is returning a 500 for me currently, so seems like more than just issues/pull requests. Also seeing actions fail with 500s on assorted steps.

zachallaun 3 years ago

Similar issues for me. I can load github.com and my profile, but visiting a repository (or trying to git pull a repo with the https origin) returns a 500.

mostafah 3 years ago

It started for me a few minutes before the status page showed something. Which is understandable, of course.

But strange that it keeps happening almost every day now.

darrenkopp 3 years ago

I can confirm this as well. I started seeing 500 errors intermittently when trying to view pages, so I checked status page and saw everything was green. Status page started showing the incident within about 3 minutes of when I started seeing issues. Clearly that's all based on happenstance of when I was landing on GitHub's website, but I have found that of all the status page's by large companies, GitHub's is almost always showing an incident as soon as I start noticing issues myself.
longwaveOP 3 years ago

Yeah, I was getting 500s for about three minutes before they posted the status update. I guess it's good that they at least update the status page in a timely fashion, but the third day in a row of downtime is not exactly good service.
- zomglings 3 years ago
  
  Have to give it to them for how useful their status page is. Other products we use play all kinds of word games to downplay issues so they don't have to show them on the status page, which is extremely annoying.

acyou 3 years ago

Does anyone host their own git repos in an enterprise environment? How do you do it and what are some good resources for learning how to do this?

tommy_axle 3 years ago

Yes, we do this using https://gitea.io/en-us/ on a private server (firewall, backups and a replica) for most projects. Github is only used when it's required by a stakeholder.
- dboreham 3 years ago
  
  Gitea here. Should disclose that we ended up contributing to the project and developing an understanding of its code in order to integrate into our org, so ymmv. (but at least you can see the source).
- trollied 3 years ago
  
  +1 for gitea. Lightweight, easy to set up.
shagie 3 years ago

Are you interested in spinning up an entire CI environment? or something where anyone can push a branch to a file based mirror?
There are aspects of permissioning that the cloud git repo providers have that become more challenging to implement as a home grown solution and unless you have the resources to maintain it, it also becomes interesting.
On one hand, you can do `git clone --mirror` and you'll have a copy of the repo and put that on a file share... though there's no permissions or automatic syncs for it (or CI). If you want those, then you get into some development (and maintenance) of the git hooks.
Going to things like a local hosted gitlab instance means that you need to have a local docker hosted environment running, and someone to maintain that, and the storage for it, and all of the other fun that comes with administering a complex 3rd party application on prem. When things are going good, it's an hour or two a week... when something breaks its several hours with calls to support (you're using a paid / licensed version to get support... right?) from someone who has a sysadmin skill set rather than a developer skillset. And don't forget about DR.
q3k 3 years ago

Yes, primary on Gerrit, backup replica to $wherever. Bonus: actually usable code review platform.
cube2222 3 years ago

You can just use GitHub Enterprise if you want GitHub, but self-hosted.
aeyes 3 years ago

Yes using gitolite, it just works.
In the past I worked at a company which used the commercial solution from JFrog, I don't remember ever having problems with git availability as a user.
c12 3 years ago

At a previous employer we used gitea along with jenkins.
trollied 3 years ago

Yes, gitlab. Self-hosted, behind a VPN.
- Kelteseth 3 years ago
  
  This. Rock solid for the last 4 years.

est 3 years ago

This gives me some the fail-whale vibes. But the blue bird didn't fail like this these days, strangely.

nickthesick 3 years ago

Have they said what has been up lately?

Vermyndax 3 years ago

They are far overdue for issuing a statement.
- capableweb 3 years ago
  
  They're having so many outages that they've moved from making statements after each event to aggregating them all and write about all of them on a month-by-month basis.
  I'm not joking: https://github.blog/tag/github-availability-report/
  - KomoD 3 years ago
    
    They've had that for years.
    
    capableweb 3 years ago
    
    I don't remember when it started getting worse, but if I do a guess, I'd say around 2018 sometime, which I'd also guess is when they started pushing more changes too.

WesolyKubeczek 3 years ago

I feel that with dynamic like this, someone could make a page showing the number of days since the last github incident.

It would show a very prominent zero and be a static page with no logic whatsoever.

bhouston 3 years ago

Yup, hitting me and all over twitter: https://twitter.com/search?q=github

daniaal 3 years ago

Trying to access repos is returning 500 for me also

megadopechos 3 years ago

Very helpful 500 page: "In the meantime, try refreshing."

ryandvm 3 years ago

Maybe they should put CoPilot in charge of ops...

yeck 3 years ago

I wasn't even able to get to repos or users/orgs for a while (though as I write this it seems like that is coming back).

SettembreNero 3 years ago

and i thought i broke `master`...

mminer237 3 years ago

git says my code pushed fine and everything is synced, but Github is not showing any of my changes.

Edit: 10 minutes later, the Github finally shows the push, but triggers still aren't working.

Edit #2: Things are working normally now.

rhymeswithjazz 3 years ago

Another day, another outage. I'm getting 500 for any repo.

moltar 3 years ago

Thee days in a row!! Is this the result of all the job cuts?

talboren 3 years ago

Is it time to move back to Jenkins? :X

dboreham 3 years ago

Now that gitea supports act_runner (and hence a reasonably usable clone of GH actions), Jenkins is dead.
- justinclift 3 years ago
  
  Oh, that's really good news. :)

remorses 3 years ago

I cannot even push code

gabrielizaias 3 years ago

Here we go again…

talboren 3 years ago

Back to biz

mr90210 3 years ago

On the bright side, maybe, just maybe the open source community realises that such centralisation might not be the best solution for hosting code.

Settings

Incident with Issues and Pull Requests

Keyboard Shortcuts