Incident with GitHub Actions
githubstatus.comPrevious GitHub incidents in March 2022:
* on 16th, https://www.githubstatus.com/incidents/fpk08rxnqjz2
* on 17th, https://www.githubstatus.com/incidents/sksd097hm0y5
* on 22nd, https://www.githubstatus.com/incidents/83lq7ftk19r5
* on 23rd, https://www.githubstatus.com/incidents/tyc8wpsgr2r8
* on 24th, https://www.githubstatus.com/incidents/y5hdmv0p49x3
scaling is a hard problem to solve, so props to them for doing it without too much issues, overall..
but damn, since the microsoft acquisition, they are having more issues more frequently
Product development has also accelerated since the acquisition. Lots of interesting stuff has been released since it
Flops have been released too (Copilot I'm looking at you)
Copilot is amazingly good. I find it particularly helpful for filling out enum unions or building other kinds of conceptually linked types that don’t have enough semantic linkage for existing autocomplete systems to figure out my intention.
Simmer down, it hasn't even been _released_ yet.
Actions runs on Azure, actions were only possible with Microsoft.
Ah yes. No one else on the internet will rent you servers.
Action Runners are basically NodeJS applications packaged self-contained. You can run Action Runners on a RPi if you want.
GitHub.com (not GHE) use Action Runners deployed using K8S. This can be done anywhere that supports K8S.
Obviously MS used their own server infra, but to imply GitHub Actions needs Azure is plain false.
I mean more so in a way that they're eating that cost. Much harder to do when you're not making huge cash from another segment of your offerings.
financially.
it was created to make good free use of azure. before azure GitHub would not be able to afford a free ci infra. someone has to pay all those servers running for free.
As the person who gets pinged when our CI isn't working, I'm hoping my next employer doesn't have github in their stack.
We've reached the stage where architectural decisions are being questioned. Nobody should need to ask me "well why are we using github?" but here we are.
They needed to fix this two weeks ago. The next time I'm in charge of stack decisions I will be evaluating competitors. This is exactly how Slack beat Hipchat.
Don't let weekends dilute your view of the situation. There's only 23 weekdays in March and Github has not been reliable for 5 of them.
Host on premise? Or use Gitlab as service or - again - on premise. Outsourcing in general and literally "outsourcing" should be done for things you cannot do yourself properly (no competence, too much work, no feasible resources).
Long read but describing it neatly: https://danluu.com/nothing-works/
A decision between - do you pay your personnel for doing it - or - do you give others money and hope they do it in a way which is good enough. Outsourcing as trend failed because it was driven by capitalism isn't a new finding. And as convenient removte web services are, the cloud is outsourcing.
I'm curious, what's stopping you from hosting on-premise and/or using your own runners?
Using your own runners GitHub doesn’t completely solve the problem, because GitHub won’t queue them to start the job during these outages.
We aren't a big enough team. The resources to do such a thing would mean less time focused on customers.
Maybe after another doubling we will have these options available. We should be a perfect fit at this stage for a hosted solution. Github is making themselves unviable for a company at this stage (low $XX million ARR)
"We aren't big enough to self-host" is an example of the effectiveness of SaaS sales pitches.
Don't get me wrong, there's a lot of good reasons to go cloud.
There's not a lot of good reasons for a build system to be completely down with no backup plans when your SaaS provider has an outage.
What does even just Github Actions' availability look like for March? 90%? That's pretty brutal for what should be a five nines service.
We have other tools that are self hosted and they require regular maintenance to security patch, write storage retention policies when some stupid log fills up, have backups, migrate cloud images, stop using plugin with deprecated api, etc. This stuff can also be unplanned and even scheduled becomes a regular chore taking time away from customers.
I know the on-prem sharks smell blood in the water but on-prem is not all upside and not the default answer to hosted services having downtime.
For example we could be on another DevOps SaaS and have zero downtime in March.
We can run the CI by hand or move it to another platform. But it takes some time and effort. This uptime is abysmal and has already stalled my work days twice in the last few weeks so maybe it’s worth the change
This is getting out of hand. I had a call with github enterprise sales yesterday about this. Best they could offer is a blog post link and taking more of our money.
Hoping to migrate away our little chunk of github to a private island by end of next week.
Hackernews talks a lot of shit about how big boys can run their computers better than us lowly startup peasants. This is a fortunate situation where that stale argument starts to fall apart really quickly. I have many other systems with far higher availability than what github offers in their public cloud or even as part of the contractual 99.5% enterprise SLA.
Sure, github is really complicated and hard. Yes, it's an incredible tool. No, it's not OK to rest on your laurels and let COTS database technology kill your product when you are a billion dollar technology company with the ability to write your own systems from scratch several times per year using parallel teams and other investor money furnaces.
I'm glad we have a self-hosted Gitlab. Sure you need to do a bit of setup and configuration, but it's worth in the long run.
It's a bit of a weird thing as the whole word tries (or tried) to move away from on-prem stuff. Like Jira stopping support for non-cloud versions etc.
Probably Atlassian was sick of people never upgrading their old installations and getting hacked for it, and people did not upgrade because it is quite a hassle in the first place, not to mention plugins breaking allll the time.
Oh and because cloud forces continuous payment whereas prior many customers simply bought a one year license and went on without renewing support.
We're running a selfhosted Gitlab Premium since 2019. The only two times in the last 3 years we had issues with artifacts not being deleted (causing nightly backups to become 500gb, will be fixed in the next version) and some out of date apt certs to run "apt update". Otherwise, I update Gitlab every month without problems.
Gitlab is a breeze to upgrade when using the Docker distribution. Swap the version number in Kubernetes or the systemd unit file (if you're using naked Docker), restart the service, that's it...
Atlassian's docker images are similarly easy to use, but with everything Atlassian you have a veritable ecosystem of plugins of which almost none are open source so you are out of luck if there are incompatibilities.
Doesn't it cause more headaches?
I've had experiences where devops minded people I've worked with have wanted to self-host services such as Bitwarden. Sure, it will be cheaper and you will have noone to blame but yourself, but once things go bad they go really bad. It's also another thing to keep eyes on.
I guess similar argument could also be extended to self-hosted clouds. Seems like it could take away a lot of focus and energy from working on the product itself.
> Doesn't it cause more headaches?
No. You update it on patch day (or when a big CVE comes out that you are actually impacted by) and know exactly what goes wrong and when. If you can't solve it, you roll back. When Github (or a part of it) goes down, you know nothing and with persistent issues there's no way to solve it either.
A company of the size where "but we have to scale" is an actual issue should self-host. A SaaS solution is a risk that you cannot mitigate.
A lot of large web services outages (such as GitHub, Azure Active Directory, Slack, etc) are purely caused by these services having to scale to the entire world, with all the complexity and moving parts it entails.
Self-hosting inherently mitigates that problem because you now need to support less than 0.1% of the load of the worldwide service.
It also puts you in control of maintenance and updates - you can choose to make changes outside of business hours so that nobody is affected if you screw up. Developers at SaaS services can't easily do that because it's always business hours in some parts of the world, and may not be motivated to do it anyway even if it was possible with some effort.
Not even sure how to talk about this anymore with how frequently this has gone down.
Our company uses GitHub actions and other features for deployments so every one of these outages stops us from putting any work on production.
I think it's easy to pile-on and say "GitHub is down again! Should've self hosted lol!".
When, in reality, it's one service having issues and not the whole site. These incidents also seem to be resolved quickly.
Downtime is not the end of the world.
Judging from last month [0] and my own comment chain, [1] it is not so good and it is like as if it is guaranteed to go down each month.
> Downtime is not the end of the world.
What if you needed to push that critical change and it is down and all you could do is wait?
What if you hosted your website on GitHub Pages? Maybe you use GitHub Actions (I assume most do here and are paying for it for their teams). Surely people use it for pull requests and issue management as well as for the webhooks and basic git operations.
There are those that went 'all in' on GitHub and use everything on it and are now crying that it is unreliable. This is where going 'all in' makes no sense. (Especially without a backup/self-hosted system somewhere.) Or centralizing everything on it as predicted years ago. [2]
[0] https://news.ycombinator.com/item?id=30841070
You always need a break glass solution for these kind of events, SaaS or self-hosted, something will break at an inopportune time and you need to be able to move forward.
Until you your self have an issue you cannot fix because github is 'having issues' so you cannot build & deploy. I'm not saying self-host everything, even though I prefer it for many things, but do make sure you have redudancy in these processes. When Github has issues, out company stops. That is not an acceptable dependency imo
> Downtime is not the end of the world.
Tell that to our customers & support...
no but in a team of thousands of developers it costs a hell of a lot of money in lost productivity.
At this point in time it really is a gamble if Actions work or not. Even if there is no “Incident” the success rate is probably at 70-80% in the last three weeks.
I would maybe start migrating repos away from GitHub, since they have proven to be quite unreliable. Nonetheless, I must say I do appreciate GitHub's UI and even their CLI is quite nice. Is there a service out there that basically provides the same good user experience, CI/CD and has no costs for public repos? Apart from the unreliability I can't complain about GitHub.
github is still kind of friendly to noscript/basic (x)html browsers. I dunno how long it will last before $soft turns github into gitlab, namely explicitely hostile to noscript/basic (x)html browsers.
I did move my repos to "free" basic git hosting services on the net.
What would you need for today "development" (broad definition) of "popular" components: a web front end, an issue tracker, a mailing-list, a git server (ssh-ed write, http/https read).
The web front end is static html.
noscript/basic (x)html is beyond than enough for an "issue tracker" (cf bugzilla), including account creation (google javascript only-captcha is really nasty).
The mailing-list would need to implement grey listing for subscription (which would be disabled once the subscription process is done). The naive usage of spamhaus lists by many sysadmins of email servers is really toxic for self-hosted ppl or users of small email services.
The git server, well, the git server.
This is significant upfront work, not to mention the maintenance with the permanent "attacks" from corpo/state-sponsored paid/brain-washed saboters to deal with.
I understand why many devs are caving-in: they endup using those gitlab-like free and ready development hosting services because of the "path of least resistance". This is a very toxic pitfall, because those hosting services forces ppl to use the absurdely huge and grostesquely complex google(blink/geeko) or apple(webkit) based browsers where noscript/basic (x)html browsers should be more than enough for most, if not all, core online development functions.
> Is there a service out there that basically provides the same good user experience, CI/CD and has no costs for public repos?
The answer is no. Other UIs are much worse in my experience. I haven't given Gitlab a good shake in a while though. That said, the network effects of Github are pretty significant. Everybody's on it, lots of major repos and orgs use it.
GitHub the source store is fine and very stable. It’s all the new features that seem half baked. Unfortunately we’re using Actions because it was a Faster workflow to get builds out. Not so much right now
https://Gitlab.com seem to be pretty good
Oh it's just the daily GitHub incident.
I had a lot of issues with GitHub actions failing yesterday (the infamous "Request failed with status code 502"), but there were no service issues listed on their status page. It just started working late yesterday and seems to be fine today though.
Again? Last time this happened was 4 days ago as I also said:
>>> Hopefully GitHub won't have another outage in a week's (or even a months) time. [0]
Yet another incident less than a week / 4 days later with GitHub Actions if one was to go 'all in' on GitHub. Would have to reset the counter once again.
You can see in the whole comment chain [0] and in [1] why I was totally right in the 'long term' of not 'centralizing everything' on GitHub since 2020. [1]
Is there a reason you’re continuously posting a link to a previous comment like anyone wants to read it? If this is a passion of yours, may I suggest posting about multi cloud strategies and not being entirely dependent on any of the providers. You’ll have more opportunities to point out you’re right each time any of them go down.
Attacks or incompetence?
Hello Microsoft executives, we would like some transparency.
I think this is still the latest they've written: https://github.blog/2022-03-23-an-update-on-recent-service-d...
Was discussed on HN too last week: https://news.ycombinator.com/item?id=30783051
Like Okta, I pay them to not do this.
Anyone skimmed their enterprise SLA docs to see how much we are due back?
Developers ain't free.