The System Monitor

21 min read Original article ↗

In the previous articles we explored the scheduler — how goroutines get multiplexed onto OS threads through the GMP model — and the garbage collector — how Go tracks and reclaims memory using a concurrent, tri-color mark-and-sweep approach. Both of these systems are impressive, but they have real blind spots. The scheduler can’t reclaim a P that’s stuck in a syscall, because no Go code is running on that thread — there’s nobody to notice and hand the P off. The GC won’t fire if no allocations are happening, because it’s allocation pressure that triggers a collection cycle. These aren’t bugs, they’re inherent limits of systems that only run when Go code runs.

That’s where the system monitor comes in, affectionately known as sysmon. It’s a plain OS thread running outside the scheduler — no P, no goroutine, just a loop that polls periodically. It retakes Ps stuck in syscalls, preempts goroutines that have been running too long, nudges network I/O forward when all Ps are busy, and forces garbage collection when the program goes idle. Think of it as the runtime’s watchdog: the thread that keeps everything running smoothly precisely because it’s not part of the game — and that’s what we’re exploring today.

What Does sysmon Do?

At its core, sysmon is an infinite loop — it wakes up, checks on things, fixes what needs fixing, and goes back to sleep. So what does it check on each iteration? Quite a lot, actually. It retakes Ps that have been stuck in system calls for too long, preempts goroutines that have been hogging a CPU, polls the network when all Ps are too busy to do it themselves, triggers garbage collection if nobody has done it in a while, wakes the scavenger to return unused memory to the OS, and even updates GOMAXPROCS dynamically. And it doesn’t run all these checks at a fixed rate — it uses adaptive sleep, checking frequently when the system is active but backing off when idle, and can even enter deep sleep for up to a minute when there’s truly nothing going on. We’ll explore all of this throughout the article.

Now that we have a high-level picture of what sysmon does, let’s start with the most fundamental question: how can sysmon monitor the scheduler if it’s part of the scheduler?

Running Without a P: The Special Thread

As we saw in the scheduler article , running Go code normally requires three things: an M (an OS thread), a P (a scheduling context — there are GOMAXPROCS of these), and a G (a goroutine). But sysmon breaks this rule. It’s an M without a P.

You can see this in the runtime initialization in src/runtime/proc.go:175 , where sysmon is created with newm(sysmon, nil, -1). That nil second argument is the P parameter — passing nil means this thread gets no P at all. (The -1 just means “auto-assign an M ID” — see allocm .)

Why does this matter? Think about what would happen if sysmon needed a P. It would be competing with your goroutines for scheduling time. If all Ps were busy running long computations, sysmon wouldn’t get to run — and it wouldn’t be able to preempt those computations. That’s a circular dependency. By running without a P, sysmon is completely independent. It can monitor all Ps objectively, step in to retake ones that are stuck, and preempt goroutines that are running too long. It’s the referee that’s not playing the game.

This independence comes with a real constraint, though. Without a P, sysmon has no mcache, which means it can’t allocate from the Go heap at all. It also can’t use write barriers, so it can’t participate in GC coordination. Everything it does must happen on the system stack. But that’s a perfectly fine trade-off — sysmon’s job is monitoring, not computation.

So sysmon runs independently, without a P. But how does it actually know what’s going on, and how often does it check? Let’s start with that second question — because the answer is: it depends.

The Adaptive Sleep Strategy

Here’s where sysmon gets really clever. It needs to check things frequently enough to be responsive, but not so frequently that it wastes CPU. The solution is adaptive sleep — an exponential backoff that adjusts based on how much work sysmon is finding (src/runtime/proc.go:6498-6506 ).

The idea is simple: sysmon keeps an idle counter that tracks how many consecutive iterations found nothing to do. When the system is active (idle = 0), sysmon sleeps for just 20 microseconds between checks — that’s 50,000 checks per second, but each check is very cheap. It stays at this rate for the first 50 idle iterations (about 1ms), giving the system a chance to settle before backing off.

After that, it starts doubling the sleep time on each iteration — 40μs, 80μs, 160μs, and so on — until it caps at 10 milliseconds. At that point sysmon is only checking 100 times per second, with negligible CPU overhead.

if idle == 0 {
    delay = 20  // Start at 20 microseconds
} else if idle > 50 {
    delay *= 2  // Double after 1ms of idle
}
if delay > 10*1000 {
    delay = 10 * 1000  // Cap at 10ms
}

The moment sysmon finds work again — say it retakes a P or preempts a goroutine — the idle counter resets to zero and the delay drops right back to 20μs. The key insight here is that if we just intervened, we’ll likely need to do it again soon, so it pays to stay alert.

There’s also a special case when the system is completely idle — sysmon can go beyond the 10ms cap and enter deep sleep for up to a minute. We’ll get to that later, but first let’s look at what sysmon actually does when it’s awake — its core responsibilities.

Core Responsibility 1: Retaking Ps and Preempting Goroutines

This is sysmon’s most important job. The retake() function walks all Ps looking for two conditions: goroutines running too long, and Ps stuck in system calls.

But how does sysmon know if a P is making progress or stuck? Each P has a sysmontick structure that acts like a heartbeat:

type sysmontick struct {
    schedtick   uint32  // Incremented on every schedule()
    syscalltick uint32  // Incremented on every syscall entry
    schedwhen   int64   // When schedtick last changed
    syscallwhen int64   // When syscalltick last changed
}

The idea is straightforward. Every time a P goes through the full schedule() loop to pick a new goroutine, it bumps schedtick (goroutines handed off via runnext share the same tick, since they’re part of the same scheduling slice). Every time a goroutine enters a system call, it bumps syscalltick. These two counters serve different purposes — schedtick is used to detect goroutines that have been running too long, while syscalltick is used to detect Ps stuck in system calls. sysmon periodically snapshots both and compares them on the next pass.

Imagine P0 is running goroutine A, then switches to B, then to C. Each switch increments schedtick: 5, 6, 7. Meanwhile, goroutine B made a file read during its turn, so syscalltick went from 3 to 4. sysmon checks in, saves both values. Next check, schedtick is 9 and syscalltick is 5 — P0 is making progress, all good. But now goroutine D starts a tight computation loop — no function calls, no system calls, just pure number crunching. sysmon checks in, sees schedtick = 9, saves it. 10ms later, schedtick is still 9 — P0 hasn’t switched goroutines. That’s the signal that a goroutine is hogging the CPU. So what does sysmon do about it?

Preempting Long-Running Goroutines

So going back to our example: sysmon checks in on P0, compares the current schedtick against its saved snapshot, and sees it hasn’t changed. It then checks when it last changed — if that was more than 10 milliseconds ago, it’s time to preempt. You can see this logic in the retake function, specifically at line 6664 .

Why 10ms? It’s a balance. Go below 1ms and you’d waste too much time on context switching. Go above 100ms and some goroutines would starve while others hog the CPU. 10ms is a nice middle ground — it roughly matches a typical OS scheduler quantum.

Once sysmon decides to preempt, it doesn’t just yank the goroutine off the CPU. Instead, it uses two strategies at once through the preemptone function. The first is cooperative preemption: sysmon sets a flag (stackguard0 = stackPreempt) on the goroutine. The next time that goroutine makes a function call, the preamble check notices the flag and the goroutine voluntarily yields. This works great for most code — but what about a tight for loop that never calls any functions?

That’s where the second strategy comes in: asynchronous preemption. sysmon sends an actual OS signal (SIGURG on Unix) to the thread running that goroutine. The signal handler interrupts execution and forces a scheduling point, even in the middle of a compute-only loop. Between the two strategies, preemption works for all code patterns.

Preemption handles goroutines that are running too long. But there’s another way a P can get stuck — what if the goroutine isn’t running at all, but waiting on a system call?

Retaking Ps from System Calls

Going back to our example, remember that goroutine B did a file read? When a goroutine enters a system call, the M (thread) blocks in the kernel, but the P is still associated with it — effectively held hostage. If that syscall takes a while, that P can’t run any of the other goroutines in its queue.

The detection mechanism here is different from preemption. sysmon first tries to call setBlockOnExitSyscall() , which atomically checks whether the P is actually in a syscall right now — this also handles the race condition where the goroutine might be exiting the syscall at that exact moment. If the P is indeed in a syscall, sysmon then looks at syscalltick: if it hasn’t changed since the last check, the P is still in the same syscall (rather than having completed one and entered a new one). Combined with a 10 millisecond timeout, that’s the signal to consider retaking.

How urgently sysmon retakes depends on the situation. If there are goroutines waiting in that P’s run queue, sysmon retakes right away — those goroutines need a P to run on. Same thing if the system is fully loaded with no idle Ps or spinning Ms — every P counts when there’s no spare capacity. It’s only when the run queue is empty and there’s spare capacity that sysmon gives the syscall a 10ms grace period before stepping in. Even then, it will eventually retake — a P stuck in a syscall prevents sysmon from entering deep sleep, so it can’t just leave it there forever.

The retake itself is handled by handoffp() , which takes the P away from the blocked M and hands it off to another thread that can actually use it. The goroutine that’s still in the syscall keeps running on its M — it just no longer has a P. When it eventually returns from the syscall, it’ll need to acquire a P again before it can continue running Go code.

Retaking Ps and preempting goroutines keeps the scheduler fair and responsive. But there’s another resource that can get neglected when the system is busy: the network.

Core Responsibility 2: Network Polling

Go’s runtime includes a network poller — a component that uses OS-level mechanisms like epoll (Linux) or kqueue (macOS) to efficiently monitor network connections. We’ll explore how it works in detail in the next article, but for now what matters is that it needs to be checked periodically. Normally, the network poller gets checked as part of the regular scheduling cycle — when a P is looking for work, it’ll poll for ready network connections along the way. But what if all Ps are busy running CPU-intensive goroutines? Nobody is looking for work, so nobody checks the network. Your HTTP server could have connections ready to accept and data waiting to be read, but no P would notice.

That’s where sysmon comes in. Every 10ms, it does a non-blocking poll of the network by calling netpoll(0) . This returns immediately with a list of any goroutines that are now ready because their network operations completed — connections accepted, data received, writes finished, and so on. sysmon then injects those goroutines back into the scheduler’s run queues so they can be picked up by the next available P.

Without this network polling, a Go program under heavy CPU load could see its network I/O stall indefinitely — connections would sit there ready but unserved until a P happened to free up.

Network polling keeps I/O moving. But remember the garbage collector from the previous article ? It normally triggers based on heap growth — so what happens when a program barely allocates?

Core Responsibility 3: Forcing Periodic GC

As we saw in the garbage collector article , GC normally triggers based on heap growth — when the program has allocated enough new memory, the runtime kicks off a collection cycle. But what about a program that barely allocates? Think of a long-running server that loaded its configuration into memory at startup and then mostly just shuffles data through buffers. There’s no heap pressure, so the GC never fires on its own.

That’s a problem. Without periodic collection, memory that is garbage won’t be returned to the OS, finalizers won’t run, and sync.Pool contents will stick around indefinitely. So sysmon enforces a simple rule: if 2 minutes have passed since the last GC cycle, it’s time to collect regardless of heap growth.

The mechanism is elegant. At startup, the runtime creates a dedicated goroutine called forcegchelper that immediately parks itself — it just sits there, doing nothing, waiting to be woken up. When sysmon notices that 2 minutes have elapsed, it injects this goroutine back into the scheduler’s run queue. The goroutine wakes up, triggers a time-based GC cycle by calling gcStart , and once that’s done, parks itself again — ready for the next time sysmon needs it.

Forcing GC ensures garbage gets collected. But collected memory isn’t necessarily returned to the OS — it might just sit there, unused but still held by the process. That’s where the scavenger comes in.

Core Responsibility 4: Waking the Scavenger

When the garbage collector frees memory, that memory doesn’t automatically go back to the operating system. The Go runtime holds onto it in case it needs it again soon — which is usually a good optimization, since reallocating from the OS is expensive. But if your program just finished a big spike of work and its heap shrank significantly, you probably don’t want it sitting on a pile of unused memory that other processes could use.

That’s the scavenger’s job. It’s a background goroutine that takes unused heap pages and tells the OS it can have them back — on Linux this is done via madvise(MADV_DONTNEED), which says “I still have this memory mapped, but I don’t need the contents, so feel free to reclaim the physical pages.” The virtual address space stays reserved (so Go can reuse it cheaply later), but the actual RAM is freed for other processes. You can see the effect by watching your program’s RSS (Resident Set Size) drop after a traffic spike.

sysmon’s role here is simple: the memory allocator sets a flag when it thinks the scavenger should run, and sysmon checks that flag on each iteration and wakes the scavenger goroutine if needed.

The responsibilities we’ve covered so far are sysmon’s heavy hitters. But it also handles a couple of lighter-weight tasks that are worth mentioning.

Core Responsibility 5 & 6: GOMAXPROCS and Schedule Tracing

Two lighter-weight responsibilities:

The first is dynamic GOMAXPROCS. At most once per second, sysmon checks whether GOMAXPROCS should change — for example, if the container’s CPU quota was adjusted while the program was running. When it detects a change, it calls sysmonUpdateGOMAXPROCS() to add or remove Ps on the fly. This is especially useful in cloud environments where CPU limits can shift without restarting the process. It’s worth noting that this isn’t on by default — you need to set GODEBUG=updatemaxprocs=1 to enable it.

The second is schedule tracing, a debugging tool you can enable with GODEBUG=schedtrace=1000. When active, sysmon periodically prints a snapshot of the scheduler’s state to stderr — something like:

SCHED 0ms: gomaxprocs=4 idleprocs=0 threads=5 spinningthreads=1
           idlethreads=0 runqueue=0 [0 0 0 0]

This tells you at a glance how many Ps are idle, how many threads are spinning looking for work, and how many goroutines are sitting in each P’s run queue. It’s invaluable when you’re debugging scheduling issues or trying to understand why your program isn’t using all its cores.

Those are sysmon’s core responsibilities. Now let’s come back to something we teased earlier: what happens when there’s truly nothing going on?

Deep Sleep: Going Dormant

When the entire system is idle, adaptive sleep’s 10ms cap is still more work than necessary. So sysmon can go one step further and enter deep sleep — potentially for up to a minute.

The conditions for this are conservative: sysmon only enters deep sleep when either the GC has stopped the world (nothing is happening anyway) or every single P is idle (there’s no active work to monitor). In both cases, there’s genuinely nothing to watch over, so sysmon can afford to take a long nap. You can see the exact check in the sysmon function .

How long does it sleep? It picks the shorter of two durations: one minute (half the force GC period, so it doesn’t miss a forced GC cycle) or the time until the next pending timer fires. This way sysmon never oversleeps past something that actually needs attention.

To coordinate deep sleep with the rest of the runtime, the scheduler struct (schedt in src/runtime/runtime2.go) has a few fields dedicated to sysmon: a boolean sysmonwait that signals whether sysmon is sleeping, a sysmonnote that other parts of the runtime can use to wake it up, and a sysmonlock mutex to serialize sysmon tasks. When sysmon goes to sleep, it sets sysmonwait to true and blocks on the note. When the runtime needs sysmon back — say, after finishing a GC cycle or when a new thread is created — it checks sysmonwait, clears the flag, and wakes sysmon through the note.

But if sysmon is asleep, how does it know when to wake up? That’s handled by a wakeup mechanism spread across the runtime.

The Wakeup Mechanism

We mentioned that sysmon sets sysmonwait to true before going to sleep and blocks on sysmonnote. The other side of that coin is scattered across the runtime: several key transition points check whether sysmon is sleeping and wake it up if so.

For example, when the garbage collector finishes a cycle and starts the world again, Ps are about to become active — so the runtime wakes sysmon to monitor them. sysmon is also woken when a goroutine enters a system call, which is arguably the most important case — sysmon may need to retake the P if that syscall blocks for too long. And it’s woken when a goroutine exits a syscall too, whether it successfully grabbed an idle P (a P just became active) or it didn’t get one and was put on the global run queue (a goroutine needs attention).

The pattern in all these cases is the same: check if sysmonwait is true, clear the flag, and signal the note to wake sysmon up. You can find these wakeup points spread across src/runtime/proc.go — look for notewakeup(&sched.sysmonnote) at lines 1780 (startTheWorld), 4750 (entersyscall), 5050 (exitsyscallfast), and 5099 (exitsyscall0).

The key insight is that the runtime only wakes sysmon when transitioning from idle to active. If the system is already busy and sysmon is already running its adaptive sleep loop, there’s no need to signal it — it’s already checking frequently enough on its own.

We’ve covered a lot of individual pieces — adaptive sleep, deep sleep, preemption, syscall retaking, network polling, forced GC, scavenging, and wakeups. Let’s see how they all work together in a real scenario.

Putting It All Together: A Complete Example

Let’s trace what happens when your web server gets a burst of traffic after being idle, and see how sysmon ties everything together.

The server has been quiet for a while. All Ps are idle, and sysmon has entered deep sleep — sysmonwait is true, and it’s blocking on its note, using practically zero CPU.

Then a request arrives. The network poller picks up the new connection and the scheduler wakes a P to handle it. As that P transitions from idle to active, the runtime notices sysmon is asleep and wakes it through sysmonnote. sysmon snaps back to attention — its idle counter resets to zero and it starts checking every 20μs again.

The handler goroutine starts processing the request and makes a database query, which is a system call. The M blocks in the kernel, and the P is stuck waiting with it. While this P sits idle, the other P’s have to pick up the slack — handling incoming requests and running any goroutines that need work done. About 10ms later, sysmon checks in, sees that syscalltick hasn’t changed — so it retakes the P and hands it off to another thread. Now that P is back in action and can share the load again, while the database query continues on its blocked thread.

At the same time, another handler on a different P is doing some heavy JSON processing — a tight computation loop that doesn’t make any function calls. It’s been running for 15ms on the same P. sysmon notices that schedtick hasn’t budged in over 10ms, so it preempts: first by setting the cooperative flag, then by sending a SIGURG to the thread for good measure. The goroutine yields, and the scheduler picks up the next one in line.

The traffic keeps flowing for a couple of minutes. During this time, the program is allocating and discarding request buffers, but the load isn’t heavy enough to trigger heap-based GC. No problem — sysmon notices that 2 minutes have passed since the last collection, so it wakes the forcegchelper goroutine. A GC cycle runs, frees up the accumulated garbage, and after it finishes the scavenger kicks in too — sysmon wakes it because the allocator flagged that the heap has shrunk. The scavenger tells the OS it can reclaim those now-unused physical pages, and the server’s RSS drops back down.

Eventually the traffic subsides. Requests are handled, Ps start going idle one by one. sysmon’s retake calls stop finding work, so the idle counter climbs and the sleep interval stretches — 20μs, then 40μs, 80μs, all the way up to 10ms. Once all Ps are idle again, sysmon enters deep sleep, sets sysmonwait to true, and goes dormant. CPU usage drops to nearly zero, and the cycle is complete.

And all along, if you had GODEBUG=schedtrace=1000 set, sysmon would have been printing scheduler snapshots every second — letting you watch the whole thing unfold in real time. Or if this server was running in a Kubernetes pod and someone scaled the CPU limit mid-traffic, sysmon would have picked that up too and adjusted GOMAXPROCS accordingly.

Throughout this entire sequence, sysmon ensured fairness by preempting the long-running JSON handler, kept things efficient by retaking the P stuck in a database syscall, made sure memory was collected and returned to the OS, and adapted its own overhead to match the system’s activity level — from 50,000 checks per second during the burst down to deep sleep when things calmed down.

Summary

The system monitor is a single independent thread that watches over the entire runtime — preempting long-running goroutines, retaking Ps stuck in syscalls, polling the network, forcing GC, waking the scavenger, and adjusting GOMAXPROCS. It does all of this with an adaptive sleep strategy that keeps it responsive when the system is busy and nearly invisible when it’s idle. The cost is just one OS thread and a few kilobytes of stack — a small price for keeping everything running fairly and efficiently.

If you want to see the implementation yourself, look at the sysmon function . In the next article, we’ll explore the network poller — the system that sysmon polls on behalf of busy Ps, and a key piece of how Go handles I/O so efficiently.