Introduction to Golang Preemption Mechanisms

109 points by lcof a year ago · 29 comments

Reader

Are there any proposals to make the golang runtime cgroup aware? Last time I checked the go runtime will spawn a OS process for each cpu it can see even if it is running in a cgroup which only allows 1 CPU of usage. On servers with 100+ cores I have seen scheduling time take over 10% of the program runtime.

The fix is to inspect the cgroupfs to see how many CPU shares you can utilize and then set gomaxprocs to match that. I think other runtime like Java and .NET do this automatically.

It is the same thing with GOMEMLIMIT, I don’t see why the runtime does not inspect cgroupfs and set GOMEMLIMIT to 90% of the cgroup memory limit.

lcofOP a year ago

On Linux, go uses sched_getaffinity to know how many cpu core it is allowed to run on:
https://cs.opensource.google/go/go/+/master:src/runtime/os_l...
- eadmund a year ago
  
  > > On servers with 100+ cores I have seen scheduling time take over 10% of the program runtime.
  > On Linux, go uses sched_getaffinity …
  Since cgroups are a Linux-only feature, OP must be running Linux. I wonder if his experience pre-dates Go’s usage of sched_getaffinity.
  edit: I realised that he references cgroups so must be on Linux.
- Thaxll a year ago
  
  This is not group aware.
  - lcofOP a year ago
    
    if you want to limit the number of Ps, then you use a cpuset, that sched_getaffinity will take into account. cgroups only allows you to limit cpu usage, but not lower the number of cpu cores the code can run on. This is “how many” versus “how much”, and GOMAXPROCS only relates to the “how many” part.
    I may have misunderstood the rationale here, but I think the discussion about cgroup support is not about limiting the number of Ps
    
    kbolino a year ago
    
    What people want is that, if cgroup limits prevent a container from using more than M/N of CPU time (N the number of cores), then GOMAXPROCS defaults to M. Ditto other managed language runtimes and their equivalent parameters.
    However, as far as I can tell, there's no clear way to figure out what M is, in the general case.
    
    lcofOP a year ago
    
    Again, I might be wrong as I did not use this directly in a couple years, but saying “the limit is 50% share of 10 cores” is not equivalent to “the limit is 5 cores”. This is still “how much” versus “how many”, and cannot translate into each other without sacrificing flexibility
    
    kbolino a year ago
    
    GOMAXPROCS sets the number of live system threads used to run goroutines. The distinction between 50% of time on 10 cores and 100% of time on 5 cores doesn't really matter here: the recommendation is to set GOMAXPROCS=5 in both cases.
    
    trws a year ago
    
    I think your comment was once completely correct, but there is now also a “cpuset” cgroup in addition to the classic cpu setting. The cpuset control gives something equivalent to sched_setaffinity but stronger since the client processes can’t unset parts of the mask or override it IIRC.
jrockway a year ago

I am guessing the API isn't stable enough for letting the runtime set maxprocs. I use https://pkg.go.dev/go.uber.org/automaxprocs and have had to update it periodically because Redhat and Debian have different defaults. (Should one even run k8s on Redhat? I say no, but Redhat says yes. That's how I know about this.)
This, I think, is cgroups 1 vs. cgroups 2 and everyone should have cgroups 2 now, but ... it would feel weird for the Go runtime to decide on one. To me, anyway.
- __turbobrew__ a year ago
  
  Which API is not stable? Cgroupfs?
  I would think that cgroupfs is considered an API to userspace and therefore it shouldn’t break in the future? Hence creating cgroups v2?
  I have written code which handles both cgroups v1 and cgroups v2, it isn’t terribly hard. Golang could also only support setting automatic parameters when running in cgroups v2 if that made things easier.
  For a language that prides itself in sane defaults I think they have missed the mark here. I could probably add support to the golang runtime in a few hundred lines of code and probably save millions of dollars and megawatts of energy because the go runtime is not spawning 50 processes to run a program which is constrained to 1 core.
  - kbolino a year ago
    
    The OpenJDK folks have quite a long and storied history of trying to do this right and still generally recommend that if you want a JVM to have the right number of CPUs, you should set the relevant parameter yourself (-XX:ActiveProcessorCount). This is basically the same advice as Go folks telling you to set GOMAXPROCS yourself.
    The problem is not just cgroups v1 vs cgroups v2 or the stability of cgroupfs, but also of CPU "shares" vs "limits", the different tunables for different Linux schedulers, the effective limits under hierarchical cgroups, etc.
  - ljm a year ago
    
    I’m not 100% sold on the idea that Go’s defaults are sane.
    They’re highly opinionated and not really that intuitive.
    
    throw-the-towel a year ago
    
    Could you elaborate?
EdSchouten a year ago

If you’re on Kubernetes, you can solve this/work around this by enabling the static CPU manager policy:
https://kubernetes.io/docs/tasks/administer-cluster/cpu-mana...
- Zimnx a year ago
  
  No, 'static' CPU manager policy provides ability to allocate CPUs exclusively to container cgroup. But since Go runtime doesn't read cpugroup information anyway, it still sees all available CPUs.
  - EdSchouten a year ago
    
    Static CPU manager also affects the cores that sched_get affinity returns. And that’s what Go uses to obtain the core count.
    
    __turbobrew__ a year ago
    
    That is only true if the pod is running within the guaranteed runtime class (requests==limits). For pods where requests!=limits a common set of cpus are used for all burstable pods, otherwise bursting past requests would not work.
    This still allows the worst case where a node with 100 cpus running butstable pods will still see huge overheads in the golang scheduling runtime.
    To my knowledge (I have done a lot of research into not only runc but also gvisor) there is no way to have the go runtime and cgroups interact in a sane way currently by default.
    If the golang runtime was cgroup aware I do believe it is possible to have sane defaults, especially since the JVM and CLR have done so.
  - __turbobrew__ a year ago
    
    Correct

metadat a year ago

> think about it - what if I suddenly stopped you while taking a dump? It would have been easier to have stopped you before, or after, but not in the middle of it. Not sure about the analogy, but you got it.

Gold.

ollien a year ago

Great post! One question that lingered for me is: what are asynchronous safe-points? The post goes into some detail about their synchronous counterparts

MathMonkeyMan a year ago

I don't know, but I remember hearing in a talk that the compiler had to be modified to insert them into the generated code.

Here's `isAsyncSafePoint`: https://github.com/golang/go/blob/d36353499f673c89a267a489be...

edit: The comments at the top of that file say:

    // 3. Asynchronous safe-points occur at any instruction in user code
    //    where the goroutine can be safely paused and a conservative
    //    stack and register scan can find stack roots. The runtime can
    //    stop a goroutine at an async safe-point using a signal.

gregors a year ago

If you'd like to see a really well done deep dive by the new Golang Tech Lead ( Austin Clements), check this out

GopherCon 2020: Austin Clements - Pardon the Interruption: Loop Preemption in Go 1.14

https://www.youtube.com/watch?v=1I1WmeSjRSw

zbentley a year ago

Interesting that it’s temporal (according to the article, you have around 10 microseconds before the signal-based preempter kicks in). How bad is performance if the load on the host is so high that double-preempting is common, I wonder? Or am I missing something and that question is not meaningful?

lcofOP a year ago

No it’s an interesting comment. This is not really about load, but about control flow: if goroutine is just spinning wild without going through any function prologue, it won’t even be aware of the synchronous preemption request. Asynchronous preemption (signal-based) is mainly (I say “mainly” because I am not sure I can say “only”) for this kind of situation.
I don’t have the link ready, but twitch had this kind of issue with base64 decoding in some kind of servers. The GC would try to STW, but there would always be one or a few goroutines decoding base64 in a tight loop for the time STW was attempted, delaying it again and again.
Asynchronous preemption is a solution to this kind of issue. Load is not the issue here, as long as you go through the runtime often enough.
jerf a year ago

You'll actually see that's a general concurrency pattern, and I mean, far beyond Go. It is certainly ideal to trigger off of some signal (in a very, very generic sense of the term, not OS signal) in order to trigger some other process (again in a very generic sense), but in general if you're programming concurrent code you should always have some sort of time-based fallback for any wait you are doing, because you will hit it out in the field. It's all kinds of no fun to have processes that will wait forever. (Unless you're really sure you need it.)

hiyer a year ago

This is a well-written article, but one thing that wasn't clear to me was how the runtime determines that it's at a safe point. Can someone shed some light on that?

MathMonkeyMan a year ago

The runtime never determines whether the goroutine is at a safe point. It "poisons" the stack guard so that the next time the goroutine reaches a function prologue, which is a safe point, it examines the stack guard and knows that it has been preempted.
Then there's the async case for tight loops that I remember reading about back in 2020 (it uses unix signals), but don't yet fully grok the specifics.

Settings

Introduction to Golang Preemption Mechanisms

Keyboard Shortcuts