Settings

Theme

GitHub Actions: Ephemeral self-hosted runners and new webhooks for auto-scaling

github.blog

192 points by gazab 4 years ago · 73 comments

Reader

sascha_sl 4 years ago

This feature was delayed every month after May.

And yet it is still half baked. We prepared for this with internally shared docs and the branch built in private for a while, but still had to roll back yesterday because the scheduler reverted to putting jobs wherever it pleased (including on ephemeral runners that already have a job) and randomly cancels large sets of jobs too.

I have been of the opinion that investing into GH Actions at this stage is purely sunken cost (at my org), and I'm not moving until the team behind this thing ships something that doesn't break half the time. These have been seriously frustrating months, because no amount of working around this messy code[1] made of 5 layers of MS style .NET (seriously, deleting a directory goes 5 layers deep in the call stack) will ever produce a stable product. They don't even know their own code base that well, when they first attempted ephemeral runners with `--once` it turned out the thing they produced could never work (because the server-side scheduler loves pipelining jobs to machines and failing miserably when these disappear, job times out after 20 minute of waiting type)

[1]: https://github.com/actions/runner

  • silasb 4 years ago

    Has your team considered looking into buildkite? We love the flexibility it gives us. Being able to dynamically build pipelines is a very nice feature that not many others have (at least that I could tell when I was researching).

    • sascha_sl 4 years ago

      We would like to run other things (probably concourse or argo), but this decision was made way further up to justify picking GitHub as provider. There might also be a Microsoft volume discount involved.

      If we hard reject actions, we’ll probably end up with the prior status quo: Jenkins.

  • mrclean8586 4 years ago

    Product manager on the GitHub Actions team reporting in, we're sorry to hear about this issue with the rollout of ephemeral runners. Our engineering team is aware of this issue and is heavily prioritizing the investigation and fix.

    We'd love to look into your specific case if you want shoot me an email: thejoebourneidentity@github.com

    • sascha_sl 4 years ago

      One, our githubcustomers/ contact is already forwarding anything / setting up a call with the team as needed, and two, that is not a twitter profile I'd ever send a DM to in a professional context, considering you retweet a lot of people diametrically opposed to my existence.

      • mrclean8586 4 years ago

        I will check in with our githubcustomers group to help accelerate. I can also directly inform our engineering team if you're open to sending me your information at

        thejoebourneidentity@github.com

        Either way, we're looking into this issue and I'll post an update here once we've learned more.

        • p8952 4 years ago

          Asking end users of your product to report issues via DM'ing your personal Twitter account, an account which is full of retweets of homophobic garbage, is really REALLY bad.

          Quietly editing your comment after being called out to hide it is even worse.

          • encryptluks2 4 years ago

            Agreed. Thanks for pointing this out. I'm genuinely curious now what their Twitter profile is, but my guess is they'll delete the tweets or remove the Twitter account.

            Microsoft notoriously hires a lot of people from the Federal Sector who unfortunately appear to be mostly right-wing religious zealots.

    • newman314 4 years ago

      One issue that I've been dealing with over the last 48hrs is that pushing Docker images to GHCR has been randomly failing with 403 errors.

      AFAIK, there has been no communication/acknowledgement of this as an issue. It makes it hard to decide to pick GHCR as a registry of choice.

      • thomasmcfarlane 4 years ago

        We recently faced this; if you are using the docker/login action I'd give that a check as it turned out it was logging us out by default at the end of each job; resulting in some race conditions when running multiple runners on the same machine (sharing the same docker daemon).

        Simple fix was to add `logout: false` to the action options.

e_proxus 4 years ago

I really wish the runner agent was written in something more portable than .NET. That choice feels like something purely political because they’re owned by Microsoft. I doubt and independent organization would have chosen it before other excellent choices such as Go, Rust etc.

Currently hosting the runner on e.g. FreeBSD or custom embedded systems is not supported (or even possible).

  • sascha_sl 4 years ago

    It's not because they're owned by Microsoft, at least not in the way you think.

    It's because GitHub Actions is rebranded Azure Pipelines.

    That a team at GitHub has been given a pile of Microsoft authored code is honestly much more concerning. They don't seem to understand it in its entirety either.

  • jbergstroem 4 years ago

    There is an ongoing project built on top of act (the "local" github runner) that accomplishes this: https://github.com/ChristopherHX/github-act-runner

  • IshKebab 4 years ago

    It runs on Windows, Linux and Mac surely? How much more portable do you need?

    • jclulow 4 years ago

      There are other operating systems and CPU architectures. It's a boon for open source projects to be able to have CI on all the BSDs, and illumos, and Plan 9, and even weirder things.

rubicks 4 years ago

Too little and too late. Meanwhile, I'm over here with gitlab self-hosted runners that "dispatch" ephemeral runners. I can tweak scaling limits and the whole contraption runs seamlessly on the AWS ec2 instances of my choosing.

My company just competed the migration from github to gitlab and, while it's not perfect, there's a lot to like on gitlab.

  • danpalmer 4 years ago

    I think it's all just a matter of team preferences and for many this is less "too little too late" and more "yet another great release" when compared to the other tools they're using.

    I personally find Actions to be a far better product than GitLab CI and we're moving all our CI from a mix of Circle/Jenkins to Actions.

    • 147 4 years ago

      What do you like about Actions more than GitLab CI? Github Actions just feels much less mature and I keep running into issues.

      • danpalmer 4 years ago

        It's funny because that's pretty much my experience with GitLab CI. Actions is certainly a younger product, I do feel that, but in terms of the design, how the pieces fit together, and what it feels like to develop for it, it all feels much more mature than GitLab CI to me. GitLab always felt like a Travis clone that was hacked to look more like CircleCI.

19h 4 years ago

The absolutely most annoying issue with GitHub Runners is the fact that they run 1 job .. at a time ... per server.

You can only imagine our follow-up meetings about the fact that we had a fleet of 15 c5a.2xlarge instances and still half of the developers were waiting up to 20 minutes for an instance to go online.

The worst part? The jobs don't clean up -- probably to allow for caching. We ran into into disk space issues regularly enough for it to force us to make the spot instances commit harakiri after 2 days.

GitHub are a cool concept and we'll probably stick with them. But their quality is just bad. There's that .NET runner and it feels like it's so massively different from anything GitHub-like you could imagine .. almost as if it's a whitelabel program they licensed or like it's the result of a 4 week contract work. Simply bad.

  • nicoburns 4 years ago

    Can't you run more than one runner per server? My understanding was you start one up, give it a directory to work in, and it'll register itself with the central server and start processing jobs. I thought you could just run more instances if you wanted parallelism.

  • judge2020 4 years ago

    > The jobs don't clean up -- probably to allow for caching.

    Ya, this helps with a specific build cache scenario I use, A workaround if you want it to cleanup is to put `rm -rf "${{ github.workspace }}"` at the end of your workflow.

wcdolphin 4 years ago

If anyone has experience using self hosted GH Actions at scale, I’d love to buy you a virtual coffee and hear about pros/cons for a parallelized CI flow currently running in Circle. Main motivation for switching would be simplification of tooling and increasing performance with better cache reuse and running within AWS for faster network access to ECR.

  • 147 4 years ago

    Reach out to me and I’ll be happy to talk to you about my experiences with GitHub actions.

  • molszanski 4 years ago

    At https://packhelp.com we use Github Actions for ~50 devs. I can share my experience, reach me.

    Also, we have some useful Ansible scripts that I might share if there is interest

  • twistedpair 4 years ago

    We moved all our builds (e.g. 100+) to GH Actions. We've been using GH Actions since it was a daily tar ball drop in a private GH Slack channel in Q3 2019.

    Happy to answer any questions.

    The biggest challenge has been the many GH Actions service outages/impacts. We're working on moving to self hosted runners to mitigate this.

koalalorenzo 4 years ago

I wish there was an official Helm Chart for k8s, like GitLab CI/CD Runner has, and not the kind that sits there and does no scale, but he kind that spins up workers on demand without taking too much resources while idle.

I wish GitHub copied that feature from GitLab too!

  • growse 4 years ago

    It's not official, but there are K8s / github actions runner deployments: https://github.com/actions-runner-controller/actions-runner-...

    I've been playing about with this and it seems to work quite well. Startup latency is quite high, and it's one pod-per-job (I think), but seems pretty flexible.

    • twistedpair 4 years ago

      I've been eyeing this for a while. My biggest hangup is that CI/CD is a major attack (e.g. supply chain) vector. If you use CI/CD for deploys, then a lot of highly privileged creds are in play.

      I'd really prefer if GH made and managed the K8s operator (e.g. the most popular infra provisioning tool) themselves.

thinkafterbef 4 years ago

The feature pull request has been there for over a year[1], it’s nice that’s it’s released!

Incoming shameless plug; if you don’t have to handle the hosting runners, but still to reap the benefits of having proper hardware(close to the metal). Check out BuildJet for GitHub actions - 2x the speed for half the price. Easy to install and easy to revert.

[1] https://github.com/actions/runner/pull/660 [2] https://buildjet.com/for-github-actions

  • hardwaresofton 4 years ago

    And a more shameless second plug, I run SurplusCI which does the same thing for GitHub and GitLab with a few other platforms on the horizon.

    I can say we're less than half the price, because we focus solely on dedicated hardware and dedicated compute. We're working onworking on pay-for-what-you-use as we speak, and this issue finally getting resolved has generated work for me this weekend.

    [0]: https://surplusci.com

  • growse 4 years ago

    Buildjet seems to be KVM-based, so does the job still runs in a VM?

    Does it support nested KVM, e.g. for running Android espresso / emulator tests?

    • thinkafterbef 4 years ago

      Yes, job runs in a KVM VM. Nested KVM is supported on the hypervisor, but KVM is not enabled by default in guest OS, due to we run a guest kernel for faster booting time. We will offer an option to enable kvm kernel module in the future.

  • veidr 4 years ago

    Wow that looks like exactly what I need. We recently moved to GHA and while it is nice in many ways, my main complaint is that unlike our previous (AWS CodeBuild/CodePipeline) setup, we can't just pay more to get more powerful instances to run CI.

    Looking into setting up self-hosted runners has been on my todo list since the first day of using GHA; will definitely check out your service soon.

noptd 4 years ago

Ephemeral runner support has been highly anticipated for our organization - I'm excited to see it go live!

However, GitHub Enterprise admins may want to take caution - some users have reported that the changes are not currently compatible https://github.com/actions/runner/pull/660

  • smcleod 4 years ago

    Github Enterprise is a license / plan on github.com - I suspect you're talking about people running the Github Enterprise self-hosted VM?

  • albertom94 4 years ago

    FWIW my team jumped the gun and encountered that issue on our GHE instance as well (v3.1).

vyrotek 4 years ago

We're pretty happy with Azure DevOps on our team.

But, these competing offerings between Azure and GitHub have been really confusing to follow. Especially since folks are pointing out that GitHub Actions is partly Azure DevOps under the hood. It just seems like a complicated branding play because some people will refuse to use an Azure service but will gladly use a GitHub service still owned by Microsoft?

  • WorldMaker 4 years ago

    Azure DevOps Pipelines is "stable"/"mature" and not seeing anywhere near as much active investment: most of the team supposedly moved directly over to Github Actions and that seems to be where all the new investment work is going.

    Azure Codespaces was rebranded at the 11th hour before launch to Github Codespaces and moved almost entirely to the Github org and Azure DevOps was never given access unlike original announced plans under the Azure brand.

    Rumors have been swirling for a while now (including when bharry, the VP whose kingdom was Azure DevOps, retired three years ago) that Azure DevOps is on the slow decline to some sort of chopping block and Microsoft will replace it entirely with Github eventually. There are rumors that even "deeply private" teams you wouldn't expect to move from Azure DevOps to Github internally at Microsoft have already migrated. (Certainly a lot of well known Windows Developers have much more active "Activity Indicators" on Github these days and it isn't necessarily entirely accountable by all the known public repos like Calculator, Terminal, etc and public facing samples projects nor that all of their documentation repos have obviously moved to Github.)

    It would be wonderful to get an actual definitive and official statement from Microsoft, even if "eventually" when a migration will happen is still "years away" (which is presumably why they are afraid to give a statement yet, if it's still too far down the roadmap). That would make it easier today for some of us to start making cases to our teams that migrating voluntarily today to Github would be good for us. (Make the debate more than just "I want Codespaces" or "I want Github's dependency scanners" but also "Microsoft suggests it".)

  • slaughtr 4 years ago

    You don't need to create an Azure account to use Github Actions. It's not really refusing to use the service as much as using the streamlined one right in front of you.

  • twistedpair 4 years ago

    What about the OSX runners? Those run in MacStadium, not Azure.

xvilka 4 years ago

The biggest problem with GitHub Actions that you can't restart just one job[1], it always restarts all jobs in the workflow. And this bug is not fixed for quite a while. Travis CI and Appveyor both allow that, of course.

[1] https://github.com/actions/runner/issues/432

  • twistedpair 4 years ago

    And, if you restart a job, no new entry is made for the job history, so it just overwrites the job history on rerun.

    This is likely because they modeled the jobs as one-per-job-def-and-commit, so they don't have a UX to show two.

    This is a security blind spot, since you can do something naughty in a job, then rerun it and the logs are no longer accessible.

hardwaresofton 4 years ago

It's a bit of an old drum to beat on but just want to note that GitLab has supported this (and provides docs for running on EC2, Fargate, k8s and other platforms like LXD[0][1][2][3]) for a very long time, and the CI system there is quite robust.

I've seen my fair share of CI systems (AppVeyor, CircleCI, GitLab, GitHub, TC, Jenkins, etc) and I'd argue that the GitLab CI is the best of all the ones I've seen:

- great syntax (it's YAML like most others but somewhat easy to organize well with great documentation)

- Fantastic documentation

- Unparalleled flexibility

- Unsurprising operation (things generally work as you'd expect)

- The ability to clear your build runner cache (Just ran into the inability to do this with CircleCI again today)

That said competition is a good thing so in general I'm glad to see this finally supported by GHA and dig into it over the weekend. GHA is making a lot of really good sustainable moves in the space and keeping the field open (their marketplace is the best) so I'm all for it.

I run SurplusCI[4] which does what you'd think (runs these runners in VMs) so getting this on-demand runners working happens bit top-of-mind, right now I only offer dedicated runners which are cheaper but of course aren't as cheap as on-demand (depending on usage).

Speaking of competition, just learned of a competitor here on HN in BuildJet[5], so if you don't want to manage your own runners check them out as well, unlike SurplusCI they actually offer to-the-minute on-demand runners, and the onboarding process looks way easier.

[EDIT] - Just to say, the list above is absolutely NOT the full list of platforms GitLab Runner supports -- it's pretty insane how many directions the community and GL have gone in. The Docker Machine integration (they maintain a fork) actually means you could run your single-use-machines on Scaleway or Hetzner easily as well, no need to muss or fuss with ASGs or k8s.

[0]: https://docs.gitlab.com/runner/configuration/runner_autoscal...

[1]: https://docs.gitlab.com/runner/configuration/runner_autoscal...

[2]: https://docs.gitlab.com/runner/executors/kubernetes.html

[3]: https://docs.gitlab.com/runner/executors/custom_examples/lxd...

[4]: https://surplusci.com

[5]: https://buildjet.com/for-github-actions

nicois 4 years ago

So this is a big step forward in terms of avoiding the race condition where CI runners would accept new jobs during scale-in operations. But how do you ensure you only spawn new ephemeral runners as jobs become available? The webhook provides part of the answer, but do we need to use something like redis to ensure exactly one runner per queued job is started?

  • sascha_sl 4 years ago

    It'd be nice if we had an API for all jobs that still need a runner, but I don't think that'll happen.

anonymousDan 4 years ago

Can someone tell me if GHA also supports non-ephemeral self hosted runners, and if so whether they work reliably? Any good resources for getting up and running with it quickly?

NiekvdMaas 4 years ago

This is great news. The only part missing is official docker support for the runner (I'm using an unofficial solution right now) and/or Alpine support.

mcintyre1994 4 years ago

The autoscaling piece is cool! One of the things that impressed me most about Gitlab CI was how easily we could get runners autoscaling in our own AWS environment. We'd run tiny instances as the actual runner, and they'd spin up bulky instances for different jobs with none of those running when nobody was working. It sounds like this might give a building block to build that in Github Actions.

elamje 4 years ago

I wonder if/when GitHub is going to start offering a Heroku-like service or full IaaS. It seems like an incredible opportunity to slap GitHubs branding on top of a subset of Azure's infrastructure and try to beat Heroku or AWS.

smcleod 4 years ago

The (previous) lack of ephemeral runners was one of my few gripes with GitHub Actions, great to see it's been released!

  • hirako2000 4 years ago

    My main grip with Github is that it's pushing more tie in features to own the development experience even further. Github started as a community development hub, now trying to swallow us all, owning each bit of the development process to then own the market.

    I don't have any affection for aws or gcp either, their attempt to dominate as de facto infrastructure and software provider is scary.

    We don't need github actions. Spinning up machines that run cook books that can do anything, even at scale is ultimately more flexible and platform agnostic. If that's time consuming to make it work at scale, providers dedicated to that are out there.

    • sascha_sl 4 years ago

      It's an option, nothing more.

      My main gripe with it that it has the same effect as MS Teams: Execs see that a new product enters the market, with a vendor they already have agreements with, and it's either bundled in for free or relatively cheap. Being the right solution for the job has already lost at that point.

      • spockz 4 years ago

        This phenomenon is also referred to “best of suite” over “best of breed”.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection