Settings

Theme

GitHub Actions Incident 29.3

githubstatus.com

97 points by rethab 3 years ago · 55 comments

Reader

temp_account_32 3 years ago

Funnily enough, GitLab is also melting down at the moment, with pipelines not running and pull requests not functioning:

https://status.gitlab.com/

  • bencevans 3 years ago

    Also causing issues due to a change that's made release source tars change [1] (changing hash), so build systems are rejecting [2].

    1: https://gitlab.com/gitlab-org/gitlab/-/issues/402616 2: https://github.com/microsoft/vcpkg/issues/30481

    • mardifoufs 3 years ago

      Isn't that the same "bug" that happened to github a few weeks ago when they updated their Git version too? It wasn't a bug per say but they still had to revert because the new hashes were causing massive build problems. Maybe it's a different root cause though.

    • Rapzid 3 years ago

      Holy cow. The cardinal DevOps sin of changing the contents of a versioned file.

  • midasuni 3 years ago

    My on prem git lab has been fine with it’s pipelines, and given it’s impossible to have an on prem uptime higher than the cloud what you say can’t be true.

  • kingds 3 years ago

    what's a pull request?

    • tmpz22 3 years ago

      A pull request is a process which can merge new code into existing code.

      "Software Engineer John was tasked to add a new logo to the website, when he was done he submitted a pull request of his feature branch into his organization's github repository for the website so that his team members could approve the changes before automation (like Github Actions) deployed live as a new version of the website."

      • nightfly 3 years ago

        Gitlab has "merge requests" the comment you're replying was trying to point out an error in an annoying way

        • kingds 3 years ago

          i tried to add a lil laughing emoji to indicate that my comment was in jest but apparently those get stripped out of comments? my bad.

        • OJFord 3 years ago

          Not even an error frankly, MR, PR, change request, patch set, personally don't care what you call it, it's the same thing.

    • momentoftop 3 years ago

      It's a request to the owner of some reference in a Git repository to pull in some changes from some reference in some (possibly other) Git repository. You can do this via email, but centralised Git hosts like Github have their own interface to this basic workflow.

    • smcleod 3 years ago

      I agree the naming is misleading - it's not actually a request to pull anything - it's a request to merge someone's branch into another. This is known as a merge request on several other platforms.

rvz 3 years ago

Once again, just two days ago [0], the whole of GitHub went down, after the RSA key leakage and the certificate key expiry on its user facing site.

It is also apparent that GitHub Actions has chronically been struggling to operate normally for at least once a month for years.

There is no question that GitHub has been more unreliable than if you were to use a self-hosted GitLab or Gittea instance yourself as I said before [1].

[0] https://news.ycombinator.com/item?id=35325850

[1] https://news.ycombinator.com/item?id=22867803

chatmasta 3 years ago

At least we don't need to scroll too far back in our comment history to copy/paste our arguments from the thread two days ago.

jacobsenscott 3 years ago

Nobody's stock goes down when actions go down. Nobody's stock goes up when actions are working. But everyone's stock goes up when you have mass layoffs. Working as designed.

  • riffic 3 years ago

    not sure what the SLA is on Actions but outages are a regular occurrence with these kinds of systems and are incredibly expensive to move to the next 9 of availability.

    it's certainly a risk you'll need to evaluate when planning your desired build process.

nimbius 3 years ago

how is it in just five years microsoft has managed to pedal this once vibrant and bustling community of developers and creatives into a roaring dumpster fire of sketcky GPL breaking copilot AI and endless seemingly random outages.

https://www.githubstatus.com/history

github has had 55 outages in 3 months. thats nearly an outage every two days.

the last six months of 2022 had 74 outages. In many shops thats tangibly worse than what their local greybeard Linux admin maintains.

arguments against spinning up my own gitlab/gitea/jenkins/whatever in podman under systemd are starting to ring pretty hollow lately.

  • _gabe_ 3 years ago

    > how is it in just five years microsoft has managed to pedal this once vibrant and bustling community of developers and creatives into a roaring dumpster fire of sketcky GPL breaking copilot AI and endless seemingly random outages.

    I mean Github Actions was released 5 years ago[0]. I imagine the infrastructure for actions is more susceptible to outages than the fairly simple features Github offered previously. It makes sense that the number of outages would increase with the additional complexity in the infrastructure.

    [0]: https://resources.github.com/devops/tools/automation/actions...

  • xxpor 3 years ago

    comments like these are what end up incentivizing companies to hide outages and report all green all the time

deltaci 3 years ago

this is already the third time github actions is down this week at wednesday morning

  • mdaniel 3 years ago

    maybe they have a "no deploy on Friday" rule :-D

    • capableweb 3 years ago

      I sure hope not, GitHub is supposed to be matured infrastructure at this point, where most if not all changes going into production should be very well tested and nothing that multiple people haven't verified as being correct should end up being deployed and released.

      Besides, Microsoft surely has 24/7 watch of their infrastructure, even on weekends, it's a huge company.

      • outworlder 3 years ago

        > Besides, Microsoft surely has 24/7 watch of their infrastructure, even on weekends

        "watching" with a dedicated team vs "waking up everyone in engg because things are on fire" are two very different things.

        Besides, size doesn't work that way. The larger the organization and the more complex the product is, the higher the chance some unexpected interaction will occur. There are processes and automation that can mitigate this, but one can never be completely certain.

        Not even the aviation industry has mastered that.

      • andrewxdiamond 3 years ago

        Bugs are a function of change, not a function of maturity.

        Just because they can have people come in on weekends to fix things doesn’t mean they like doing that.

        I know many “mature” software platforms that do not deploy on Fridays or off hours at all

        • bastardoperator 3 years ago

          A true global company doesn't have off hours.

          • andrewxdiamond 3 years ago

            As someone who runs one of the most global APIs in the world, I promise you, I do in fact sleep

            • zamnos 3 years ago

              What's your pager rotation like? I want to say you have follows-the-sun, and so your on-call shifts are 12-hours long and you swap with a team on the other side of the world from you so you can get said sleep, but I don't want to just assume that.

              • andrewxdiamond 3 years ago

                Dayshifts are 9-5, night shifts are 5-9. Same team rotates through both. I have done plenty of overnight oncalls.

                Some teams do follow the sun type rotations, but my team is all in Seattle.

            • bastardoperator 3 years ago

              That's why a global team is important, when you sleep they work, when they sleep you work. Work is constant when you're servicing the globe.

              • andrewxdiamond 3 years ago

                And work is constantly slow as well, since it’s impossible to get people in the room at the same time

                • bastardoperator 3 years ago

                  Why does everyone need to be in the room? I have a groomed backlog and can talk to people async as needed. We also record meetings if you missed them and depending on the context of the meeting or importance, we'll hold timezone friendly meetings for everyone as required.

                  • andrewxdiamond 3 years ago

                    ¯\_(ツ)_/¯

                    We have a different work philosophy. I do work with teams in India and England, and it’s painful to accomplish anything cross team

            • capableweb 3 years ago

              Everyone employed by your company, who work in the same industry stops and starts working at the same time, all around the world?

              • andrewxdiamond 3 years ago

                No, but my team owns our own APIs. We are all located in the same area and we go oncall for our service.

          • zamnos 3 years ago

            Less "on" hours then? Even Google has diurnal patterns when there's a lower amount of traffic simply due to the fact that humans are unevenly distributed across the Earth's surface. And Google does code freezes for the holidays where they don't deploy at all.

web3-is-a-scam 3 years ago

Github is so crappy now, it feels like something is always wrong with it. Thanks Microsoft.

sithlord 3 years ago

Wonder if these were managed by a defunct team from India?

riffic 3 years ago

what is the "29.3" in the title supposed to represent? is that supposed to indicate a date of March 29th? I do not see a reference to this on the incident page itself.

  • nurgasemetey 3 years ago

    It is because titles will be duplicated on search if 29.3 is not added

    • riffic 3 years ago

      is it a date? is it a version?

      I'm just jumping up and down how awful this representation is. iso8601 (YYYY-MM-DD) is a settled debate at this point.

gxt 3 years ago

At what point are organisations going to ask themselves wether this is intentional or not. Your velocity is disrupted by a likely competitor. I'd move out.

  • aaomidi 3 years ago

    I don’t have an opinion on it is or not but layoffs do have a significant impact on that and it’s generally something that’s pretty impossible to study.

  • jacooper 3 years ago

    Or simply just self host actions?

usrme 3 years ago

Azure is also having somewhat widespread issues at the moment, so I'd venture to guess that these two are also linked.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection