Slices, Maps, and Channels

So far in this series we’ve looked at the parts of the Go runtime that orchestrate execution — the memory allocator, the scheduler, the garbage collector, sysmon, the netpoller. Today we’re switching gears and looking at three of the most ordinary things in Go: slices, maps, and channels. They are the bread and butter of every Go program. You probably write all three of them several times before lunch.

But “ordinary” is doing a lot of work in that sentence. None of these are language primitives in the way an int is — they all have real, non-trivial data structures behind them, allocated on the heap, managed by the runtime, and tuned aggressively for performance. The syntax (s[i], m[k], <-ch) hides all of that, and that’s the point. But once you peel the syntax back, what’s actually sitting in memory? That’s the question we’re going to answer.

We won’t talk about how to use slices, maps, and channels — there are a million tutorials for that. We’re going to look at the structures themselves, the way the runtime lays them out and operates on them.

We’ll go in increasing order of complexity, so let’s start with the simplest of the three.

When I started writing Go, one thing surprised me almost immediately. The idiomatic way to add an element to a slice is slice2 = append(slice1, element) — and on the surface it looks like you’re producing a new slice while leaving the old one untouched. My first reaction was: how on earth is this efficient? Surely allocating and copying a fresh array every time you append a single element is going to wreck performance and burn through memory? And yet Go programs do this constantly without any visible problem.

The answer is that there’s a smart design hiding underneath, and it’s not really specific to Go. The same approach shows up in Vec in Rust, list in Python, ArrayList in Java, std::vector in C++ — they all share the same idea. So even though we’re going to look at the Go-specific runtime details, the shape of the thing is something you’ve already encountered if you’ve used any of those.

The first piece of the trick is that a slice doesn’t actually store any of its data inline. What you pass around when you pass a slice is a tiny header — three words — that points to a separate backing array living somewhere on the heap (or, sometimes, the stack).

The header lives in the runtime as the slice struct, but conceptually it’s just three fields sitting next to each other in memory:

Slice header structure: array, len, cap

That’s the entire slice. On a 64-bit machine it’s exactly 24 bytes regardless of how many elements you have. A slice of one billion int64s and a slice of zero elements are the same size as a value — the difference is only in what array points to and how big the allocation behind it is.

To make that concrete, imagine you have a slice like this []int{5, 7, 4, 2, 9}. The header lives wherever your variable lives (often on the stack), and array points at a backing array sitting on the heap. The runtime may have given you a slot with a bit of extra room — say 8 elements’ worth — so the unused tail is zeroed:

Slice example for []int{5, 7, 4, 2, 9} with header pointing to a backing array on the heap

This three-word design is what makes slices so cheap to pass around. Passing a slice to a function copies the header, not the data. Two slice variables can point to the same backing array, and a sub-slice (s[2:5]) is just another header that shares the underlying memory of s — same array pointer (offset by 2 elements), len of 3, and cap reflecting how much room is left from offset 2 to the end of the original capacity.

So far so simple. The fun starts when you ask the slice to hold more than it currently can.

Growing a Slice

The interesting part of the slice machinery is what happens when append runs out of room. If len < cap, append just writes the new element into the existing backing array and bumps len. Cheap. But when len == cap, the runtime needs to allocate a bigger array, copy the old data over, and return a new header pointing to it. This is the path that lives in growslice in src/runtime/slice.go.

To see this in motion, let’s trace what happens as we grow a slice from empty, one element at a time. We start with s := []int{}. The header exists, but there’s no backing array at all — array is nil, len and cap are both 0, and the heap is untouched:

Empty slice: nil array pointer, len 0, cap 0, no heap allocation

Now we call s = append(s, 5). cap is 0, so the runtime has to allocate. It picks a small backing array (one slot is enough), copies nothing (there was nothing to copy), writes 5, and returns a new header:

After append(s, 5): len 1, cap 1, single-slot backing array on the heap

Next, s = append(s, 7). Now len == cap == 1, so the runtime has to grow. It allocates a new array with double the capacity (2), copies the existing 5 over into it, writes 7 after it, and returns a brand new slice header — fresh array pointer, len of 2, cap of 2. Because we assigned the result back to s, the old header (and its one-element array) is no longer referenced from anywhere, so it just sits in the heap waiting for the GC to collect it:

After append(s, 7): len 2, cap 2, new two-slot backing array on the heap with the old one-slot array now orphaned

One more: s = append(s, 4). Same story — len == cap == 2, so the runtime allocates again, doubling the capacity to 4 (the rest stays zeroed), copies the existing two elements over, writes 4, and returns yet another new header with its own array pointer, len of 3, and cap of 4. Two orphaned blocks now linger in the heap:

After append(s, 4): len 3, cap 4, new four-slot backing array on the heap with two orphaned arrays still lingering

When you see this behavior, at the beginning it looks wasteful — asking for more space for every single element. But in reality, because the array doubles every time it grows, it quickly needs to grow less and less often.

The exact rule that growslice uses to pick the new capacity is simple: while the slice is small (under 256 elements) it just doubles, and once it crosses that threshold it gradually transitions toward growing by 1.25×. Doubling forever would waste a lot of memory for big slices, and jumping straight to 1.25× would cause too many copies for small ones, so the runtime smoothly interpolates between the two as the slice grows.

The same shared-backing-array design that makes slices cheap to pass around also has a couple of sharp edges worth knowing about.

Pitfalls of the Shared Backing Array

The shared-backing-array design is what makes slices cheap, but it’s also the source of a couple of surprises that bite almost everyone at least once.

The first one is that two slices from the same source share their backing array:

a := []int{1, 2, 3, 4, 5}
b := a[1:4]   // b is [2, 3, 4]
b[0] = 99
fmt.Println(a) // [1, 99, 3, 4, 5]

Sub-slicing doesn’t copy anything — b is just a new header with an array pointer somewhere inside a’s array, plus its own len and cap. Writing through one slice writes into the same memory the other slice is reading.

The second one is sneakier and shows up when you append into a sub-slice:

a := []int{1, 2, 3, 4, 5}
b := a[:2]            // b is [1, 2], len=2, cap=5
b = append(b, 99)
fmt.Println(b) // [1, 2, 99]
fmt.Println(a) // [1, 2, 99, 4, 5]

b has len=2 but cap=5, because the backing array is shared with a and there’s still room behind position 2. append sees the spare capacity and writes the new element directly into the shared array — no reallocation. From b’s point of view nothing weird happened, but a[2] got silently overwritten.

Should you lean on this behavior? Don’t. The two slices stay connected only until one of them triggers a grow — the moment append has to allocate a new backing array, the slice that grew gets its own private memory and the other one is left untouched on the original. Whether you’re sharing or not depends entirely on whether the next append happens to fit in the existing capacity, and the calling code usually can’t tell which case it just hit. Treat this aliasing as a sharp edge to avoid, not a feature to rely on.

If you want to hand out a sub-slice without exposing the parent’s spare capacity, use the three-index slice expression: b := a[start:end:end] caps b’s capacity at end - start, so the very next append is forced to allocate a fresh backing array instead of overwriting whatever lived behind position end in a. It’s a useful tool when you’re returning a slice into a buffer you don’t want callers to clobber. (Added after publication — thanks to Aliaksandr Valialkin for the nudge.)

You can understand these pitfalls as bugs in the design, but they’re really just consequences of “slices are headers over a shared array”. Once you internalize that, the surprises stop being surprising.

That’s it for slices. Now let’s look at something quite a bit more involved.

Maps

For most of Go’s history, maps used a classic chained hash table. It worked well enough, but it had some long-standing performance issues. Starting with Go 1.24, the map implementation was rewritten to use Swiss Tables, a design originally developed for Google’s Abseil C++ library. Modern Go programs use this implementation by default, and it’s the one we’re going to look at.

The code lives under src/internal/runtime/maps/ , with the main types in map.go, table.go, and group.go. If you go reading it, that’s where to look.

Before we dive into the bytes, it helps to know what we’re zooming into.

The Big Picture

A modern Go map is built out of three nested levels:

A Map — the top-level header, what the variable actually refers to.
A directory of one or more tables.
Each table is an array of groups, and each group holds 8 slots.

Big-picture: Map header → directory of tables → groups of 8 slots

Reading the diagram left to right: the Map header is what your variable points at, and it holds a pointer to the directory. The directory is a small array of slots, each one pointing at a table — and notice that several directory slots can point to the same table (the left two slots both go to the first table, the right two both go to the second). Each table is an array of groups, and if you zoom into any one group you’ll find 8 key/value slots packed together.

Why three levels and not just one big array of groups? Because of how growth works, which we’ll get to. There’s also a special case for very small maps: as long as the map has only ever held up to 8 entries, the runtime skips the directory and the table entirely — dirPtr points straight at a single group, dirLen is 0, and the whole structure collapses to just “header → one group”. Once the map outgrows that group it gets a real directory and table; for big maps, the directory grows over time so individual tables can be replaced independently.

Let’s walk through these levels from the top down, starting with the map header itself.

Similar to the slice we saw before — a tiny header pointing at a backing array — what your m variable actually refers to is just a small struct with a handful of bookkeeping fields and a pointer to the rest of the structure. The Map struct looks roughly like this:

Map header fields: used, seed, dirPtr, dirLen, globalDepth, globalShift

used is just the count of live entries — that’s what len(m) returns. dirPtr is the pointer to the directory (the next layer down), and dirLen, globalDepth, and globalShift are all derived numbers used to translate a hash into a directory index quickly: dirLen is the directory size, globalDepth is its log₂, and globalShift is the number of low bits to drop from the hash so that hash >> globalShift directly gives you the slot in the directory (pointerSize − globalDepth, which is 64 − depth on 64-bit platforms and 32 − depth on 32-bit/Wasm).

The seed is per-map and randomized, so the order you see when you range over a map differs even between maps with identical contents — and so adversaries can’t precompute keys that all collide into the same group.

The real Map struct in the runtime carries a few more fields we’re glossing over: writing (a flag the runtime flips during writes so it can panic on concurrent-write races), tombstonePossible (a fast-path hint that lets the runtime skip tombstone handling when no slot has ever been deleted), and clearSeq (a counter bumped on clear(m) so an in-flight range can detect that the map was wiped under it). They’re worth knowing about if you read the source, but they don’t change the shape of the structure.

The header’s most important pointer is dirPtr, so the next thing to look at is what sits on the other end of it.

The Directory

dirPtr points at the directory, which is just an array of pointers to tables. Its size is always a power of two, which means the runtime can use the leading bits of a key’s hash directly as the index — no modulo, no extra math, just a couple of cheap instructions to land on the right table.

One detail worth keeping in mind: multiple directory slots can point to the same table. This isn’t the starting state — when a map first graduates from the single-group fast path it gets a directory with one slot pointing at one table. The arrangement where several slots share a table appears later: when a table somewhere in the map fills up and needs to be split, the directory may have to double to fit the split. Doubling the directory means every existing entry is duplicated, so any other table that wasn’t involved in the split is suddenly reachable through two slots. The next time one of those tables splits, those two slots split apart again. So “many slots → one table” is what you get for tables that haven’t kept up with the directory’s growth, not the starting layout.

For now, just think of the directory as a thin lookup layer between the header and the tables: a hash comes in, the top bits pick a slot, the slot tells you which table to look in.

Directory: leading hash bits index into a power-of-two array of pointers to tables

In this example the directory has 8 slots (2³), so the top 3 bits of the hash pick the slot directly. The hash 101… lands in slot 101, which points at table C. Notice how each table is reachable through two directory slots — that’s the “multiple slots can share a table” detail. As the map grows and tables split, those pairings break apart and each slot eventually points at its own table.

Now that we know how the directory hands us a table, let’s open one up and see what’s inside.

Tables

A table is an array of groups, and like the directory it’s always sized to a power of two — 8 groups, 16, 32, and so on, capped at 128 groups (so up to 1024 key/value slots in a single table). Once the directory has handed us a table, picking a group inside it is the same idea as picking a slot in the directory: we use a chunk of the key’s hash as the index, with no modulo or extra math.

The piece of the hash used here is called H1 — the upper 57 bits of the 64-bit hash. The lower 7 bits are H2 (the per-slot preview we already saw); H1 is everything above that. To find the starting group, the runtime takes the bottom of H1 — just enough bits to cover the number of groups in the table — and that’s the index. Quick to compute, and it spreads keys evenly across the table.

So at this point the full path looks like: hash the key → top bits pick a directory slot (which gives us a table) → low bits of H1 pick a starting group inside that table → H2 is what we’ll use to locate the value inside the group when we get there.

When a table fills up past its load factor target — Swiss tables can run quite hot, around 7/8 full — it needs to grow. There are two cases:

Below the cap. If the table has fewer than 128 groups, it just doubles: a new table is allocated with twice as many groups, every key in the old table is rehashed and placed into the new one, and the directory entries that used to point at the old table are updated to point at the new one. Other tables in the directory aren’t touched.
At the cap. Once a table is at the maximum of 128 groups, it can’t simply double anymore. Instead, the runtime allocates two new tables of 128 groups each, splits the old table’s keys between them (rehashing each one to decide which side it lands on), and the directory slots that used to point at the old table are split: half now point at the first new table, half at the second. If only one directory slot was pointing at the old table — meaning there’s no room to give half to each new table — the directory itself doubles first (allocate a new directory twice as big, copy each existing pointer into both of the corresponding two new slots), and then the table split happens. Same idea as the table case: power-of-two growth that only touches what it needs to.

In both cases, the old table is fully rehashed and its contents are moved over — but only that table. The rest of the map is untouched.

The big win is that growth is local. The old approach to map growth required keeping both the old and new tables around and moving entries gradually on every operation, because rehashing the whole map at once would cause a big stall. With Swiss tables and a directory, only the table that overflowed gets rehashed — that rehash still happens all-at-once, but its cost is bounded by the size of a single table (at most 1024 entries), not the size of the entire map. The other tables in the directory keep working exactly as they were. For large maps this is dramatically smoother than the old “rehash the whole world” model.

Table split: full table A is replaced by A₀ and A₁; table B and its directory slots stay untouched

In the before picture, the directory has four slots and table A (with all its groups, abbreviated as g … g) is reached through two of them (00 and 01); table B is reached through the other two (10 and 11). When A overflows, the runtime allocates two new tables A₀ and A₁, rehashes every key from A and places it into one of them, and rewrites the two directory slots that pointed at A so they now point at A₀ and A₁ respectively. The original table A is left without any incoming pointers — shown greyed out as “orphaned” in the after picture — and the GC will eventually reclaim it. Table B and its directory slots are not touched at all — that’s “growth is local” in action.

With the header, directory, and tables in place, the only level left is the smallest one: the group itself.

Groups: 8 Slots and a Control Word

A group holds up to 8 key/value pairs, plus an 8-byte control word with one byte per slot that tells you the slot’s state at a glance:

empty (0x80) — nothing has ever lived here.
deleted (0xFE) — something used to live here but it’s gone (we’ll see why this isn’t the same as empty).
occupied — top bit 0, lower 7 bits hold a preview of the key’s hash, called H2.

A snapshot of a group with all three states:

Group with a mix of occupied (H2 preview), deleted, and empty control bytes

The H2 preview is the whole point: comparing a full key is expensive, comparing one byte is essentially free. To find a key, the runtime compares its H2 against the 8 control bytes and only does a real key comparison on the slots that matched — usually one or zero. As a bonus, the 8 control bytes fit in a single CPU register, so checking all of them is just a handful of bitwise operations.

So why do we need a “deleted” marker instead of just marking the slot empty? Because when an insert finds the chosen group full, it keeps probing and places the key in some later group — meaning a key isn’t always in its first-choice group. Lookups follow the same chain and need a stop condition: a group with any empty slot means the key is definitely not in the map (otherwise the insert would have used that empty spot first).

If we marked deletions as empty, that stop condition would lie — a search passing through could bail out early even when the key it wants lives further down the chain. The “deleted” state preserves the chain (lookups keep probing), and inserts are still allowed to reuse those slots.

With every layer of the structure described, let’s see how the three operations you actually care about — read, insert, and update — play out on top of it.

Reading, Inserting, and Updating in Action

Reading m[k]. Hash the key, use the top bits to pick a directory slot (and therefore a table), use another chunk of bits to pick a starting group inside the table, and scan that group’s control word for an H2 match. If you find one and the key really matches, that’s your value. If the group has an empty slot and you didn’t find a match, the key isn’t there — done. Otherwise, jump to the next group in the probe sequence and try again.

Inserting m[k] = v for a new key. This is essentially the same walk as a read, with one extra job: as you go, remember the first deleted slot you saw (empty slots wouldn’t appear mid-chain — they’d terminate the probe). When you reach a group with an empty slot, you know k doesn’t exist, and now you write the new entry: into the remembered deleted slot if you saw one (which reclaims a tombstone), otherwise into the empty slot itself. Then set its control byte to the key’s H2, and bump used in the header.

Updating m[k] = v for an existing key. This one is the read path, almost unchanged. You walk the probe sequence the same way; the moment you find the matching key, you overwrite the value in place. The control byte doesn’t change (the key is the same, so its H2 is the same), and used doesn’t change either — it’s the cheapest of the three operations, which is why writing the same key over and over is essentially free.

That’s the whole structure. Three levels, all working together: the map header points to a directory, the directory points to tables, and tables hold groups of 8 slots with a control word that lets the runtime check 8 candidates in a few instructions.

If you want to go deeper into how Swiss tables landed in Go, Bryan Boreham gave a great talk on this at GopherCon UK 2025 — “Swiss maps in Go” — that walks through the design choices and the performance numbers in much more detail than we’ve done here.

Now, the third one.

Channels

A channel is a queue with synchronization built in. Underneath the syntactic sugar (ch <- x, <-ch, close(ch)), it’s just a struct on the heap with a buffer, two queues of waiting goroutines, and a mutex. The struct is called hchan and lives in src/runtime/chan.go. Let’s look at it.

hchan struct fields, with buf/recvq/sendq pointing at backing structures

When you call make(chan T, n), the runtime allocates one of these. If n > 0 and T doesn’t contain any pointers, the header and the buffer are placed in a single contiguous allocation: hchan first, then the n element slots immediately after, with buf pointing just past the header. If T does contain pointers, the buffer is allocated separately as a typed []T-style allocation so the GC can scan it correctly — that’s two allocations instead of one. For an unbuffered channel (n == 0), there’s no buffer at all and the channel is unbuffered — every send has to find a paired receive (and vice versa) directly.

That’s a lot of fields at once, so let’s break the struct into three groups and look at them in order:

Channel-wide info — the fields that describe what kind of channel this is and protect it: elemsize, elemtype, closed, and lock.
The circular buffer — the buf array, the indexes that walk around it (sendx, recvx), and the size/count fields (dataqsiz, qcount).
The goroutine queues — recvq and sendq, two lists of goroutines parked on this channel waiting for someone on the other side.

Once those three pieces are clear, we’ll put them together and walk through what actually happens on a send, a receive, and a close.

Channel-wide Info

A handful of the fields are just bookkeeping that applies to the whole channel, regardless of whether you’re sending, receiving, or just inspecting it.

elemtype is a pointer to the runtime type descriptor for T — it’s how the runtime knows the element’s size, alignment, and (importantly for the GC) whether it contains pointers. elemsize is the size in bytes, cached here so the hot path doesn’t have to chase the type pointer to figure out how many bytes to copy on every send and receive.

closed is a flag (0 while the channel is open, non-zero once it’s been closed) checked at the top of every operation. We’ll see in a moment why a closed channel behaves so differently for senders versus receivers.

lock is a mutex that protects everything else in the struct. Channels look lock-free from the outside, but the runtime acquires this lock for the duration of every send, receive, or close. The work done while holding it is tiny — a few field updates and at most one memory copy — so the lock is uncontended in the vast majority of cases. Still, every channel operation is, internally, a small critical section.

With the bookkeeping out of the way, let’s look at the part that does actual data movement.

The Circular Buffer

For buffered channels, buf is a circular array of dataqsiz slots. sendx is where the next send writes, recvx is where the next receive reads. Both wrap modulo dataqsiz. qcount is just the count of items currently in the buffer — when qcount == 0 the buffer is empty, when qcount == dataqsiz it’s full.

Circular buffer with sendx and recvx walking around it

In this snapshot the buffer has room for 8 elements (dataqsiz = 8) and currently holds 4 of them (qcount = 4). recvx points at slot 2 — the oldest item in the buffer, which the next receive will pull out. sendx points at slot 6 — the next free spot, which the next send will write into. Filled slots sit between recvx (inclusive) and sendx (exclusive). Both indexes advance one slot at a time and wrap around to 0 when they run off the end, which is the only thing that makes this “circular” in any meaningful sense.

That’s all the buffering machinery is. Send writes at sendx, advances sendx, increments qcount. Receive reads at recvx, advances recvx, decrements qcount. Easy.

The buffer handles the easy cases. The hard cases — sending to a full buffer, receiving from an empty one — are where the third piece of the struct comes in.

`sudog` and the Wait Queues

The interesting part is what happens when you can’t proceed — when you send to a full buffer, or receive from an empty one, or do anything on an unbuffered channel without a partner.

The runtime parks the goroutine. To park it on the channel, it allocates a small struct called a sudog that represents the waiting goroutine on this channel, along with a pointer (elem) to the value being sent or to the variable that should receive.

The sudog is then linked into the channel’s sendq (if it’s a sender) or recvq (if it’s a receiver). These two waitq fields are just doubly-linked lists of sudogs. The goroutine itself is parked through the scheduler with gopark, marked as not runnable, and the OS thread it was on goes off to do something else.

When a paired operation eventually arrives — a receive on a channel with senders waiting, or a send on a channel with receivers waiting — the runtime pulls the head sudog off the queue, copies the value directly between the two goroutines, and wakes up the parked goroutine via goready. The scheduler picks it up and it resumes after the channel operation as if nothing had happened.

This is one of the channel’s quietly clever optimizations: when there’s a goroutine waiting on the other side, the value doesn’t go through the buffer. It’s copied directly from the sender’s elem pointer to the receiver’s elem pointer, and the buffer is bypassed entirely. For unbuffered channels this is the only way it can work — there’s no buffer at all. For buffered channels it’s an optimization that avoids one round trip through the queue when a receiver is already blocked waiting.

That’s all the moving pieces. Now let’s actually run a small program through them and watch the channel’s state change at each step.

Sending and Receiving in Action

Let’s run a small example through the structure to see how the pieces play together. Picture a channel created with ch := make(chan int, 1) — a buffered channel with one slot — and six goroutines (A through F) that each do a single operation on it. We’ll mix sends and receives in an order that forces the channel through every interesting state, and watch what happens step by step.

Step 1: A sends 10. The buffer is empty (qcount = 0), no receivers are parked, so A just writes 10 into slot 0, advances sendx, bumps qcount to 1, and goes on with its life.

Step 1: A sends 10 into the empty buffer

Step 2: B sends 20. qcount == dataqsiz, so the buffer is full and there’s no receiver to hand 20 to directly. B is blocked: the runtime allocates a sudog, sets its elem to point at B’s value 20, links it onto sendq, parks B with gopark, and the OS thread B was running on goes off to do other work.

Step 2: B’s send blocks on a full buffer; B parks itself onto sendq

Step 3: C receives. Because there’s a parked sender waiting, the runtime takes a small shortcut and does it in one go: it copies the value at recvx (which is 10) into C’s destination, then immediately copies B’s 20 from the head sudog on sendq into that same slot, advances recvx (and sets sendx = recvx to keep them aligned around the now-still-full buffer), and wakes B via goready. The buffer was full and stays full — qcount doesn’t change. The neat trick is that one in-place rotation gets C’s value out, gets B’s value into the right place to preserve FIFO order, and unblocks B, all without ever resizing the queue.

Step 3: C receives 10, wakes B, and B’s 20 ends up in the buffer

The buffer keeps the FIFO order working: C got 10, the next receive will get 20, and B’s send appeared to “happen” in the order it was issued.

Step 4: goroutine D receives. Same easy path as step 3 minus the parked sender: it reads 20, advances recvx, decrements qcount to 0. The channel is now empty.

Step 4: D receives 20 and the buffer is empty again

Step 5: E receives. qcount == 0 and there’s no parked sender, so E has nothing to read. The runtime allocates a sudog, sets elem to the address of E’s receive variable (where the value should eventually be written), links it onto recvq, and parks E. The next ch <- v somewhere in the program will pull E’s sudog off the queue, copy v straight into E’s variable, and wake E — bypassing the buffer entirely.

Step 5: E’s receive finds the channel empty; E parks itself onto recvq

Step 6: F sends 30 while E is still parked. The runtime checks recvq before it touches the buffer, finds E’s sudog waiting there, and copies 30 directly into E’s receive variable — no buffer slot involved at all. Then it wakes E via goready and F continues. From E’s point of view, its receive completed and it got 30; from F’s, its send completed instantly. The buffer never saw the value.

Step 6: F sends 30 directly into parked E’s variable, bypassing the buffer entirely

This last step is the “value doesn’t go through the buffer” optimization in action — and it’s also exactly how unbuffered channels work all the time. An unbuffered channel (make(chan int)) has no buffer to write into, so this direct-handoff path is the only path: every send has to find a receiver on recvq, every receive has to find a sender on sendq, and the value is always copied straight from one goroutine’s variable into the other’s. The “buffer” version is just the same machinery with a fast path tacked on for when the producer happens to be ahead of the consumer.

That covers sending and receiving. There’s one more channel operation that has to interact with all of this state.

Closing

Closing a channel does three things, all under the lock:

Marks closed as non-zero.
Walks recvq and wakes every parked receiver, each one returning the zero value of the element type.
Walks sendq and wakes every parked sender — but a parked sender on a closed channel is a panic, so the runtime sets a flag on each sudog and the senders panic when they wake.

After close, sends panic, receives drain the buffer first and then return zero values forever. The closed flag is checked at the start of every operation.

That’s the whole tour — three structures, all the way down to the bytes. Let’s wrap up with a quick recap.

Summary

Three structures, three different shapes — all of them just memory on the heap, dressed up by the runtime to look like language primitives.

A slice is a small header (array, len, cap) pointing at a backing array. append doubles the array (until 256 elements, then transitions toward 1.25×) when it runs out of room and leaves the old one for the GC. Two slices that share a backing array see each other’s writes — until one of them grows and quietly stops sharing, which is the source of the classic “mutating a sub-slice changed my parent” surprise.

A map is a header → directory → tables → groups, with the smallest maps collapsing straight to “header → one group” as a fast path. The directory uses the top hash bits to pick a table, H1 picks a starting group inside it, and a group’s 8-byte control word lets the runtime check 8 candidate slots in a handful of bitwise operations using H2 (a 7-bit preview of the key’s hash). Growth is local: only the table that overflows gets rehashed, the rest of the map keeps working untouched.

A channel is an hchan with a circular buffer and two queues (sendq, recvq) of sudogs representing parked goroutines. Sends and receives walk the buffer indices when there’s room or items waiting, and park onto a queue when there isn’t. When a sender finds a parked receiver (or vice versa) the value is copied directly between the two goroutines, skipping the buffer — that direct handoff is also the only path an unbuffered channel ever takes.

If you want to read the source, src/runtime/slice.go , src/internal/runtime/maps/ , and src/runtime/chan.go are surprisingly approachable — short files, well-commented, and once you know what each piece is for they read pretty much like the description above.

In the next article, we’ll zoom into the select statement: how the compiler turns those case clauses into a runtime call, how selectgo picks a winner without deadlocking when several goroutines are selecting on overlapping channels, and how fairness is enforced when more than one case is ready.