So just how fast are channels anyway?

The speed of channels isn’t really the point, but I couldn’t help but wonder how fast they are. So I wrote a benchmark!

My benchmark sends a single byte at a time to a goroutine that just reads and discards the byte. Here’s my benchmark.

func BenchmarkChannelOneByte(b *testing.B) {    ch := make(chan byte, 4096)
    wg := sync.WaitGroup{}    wg.Add(1)    go func() {
        defer wg.Done()
        for range ch {
        }
    }()    b.SetBytes(1)
    b.ReportAllocs()
    b.ResetTimer()    for i := 0; i < b.N; i++ {
        ch <- byte(i)
    }
    close(ch)
    wg.Wait()
}

And the results:

BenchmarkChannelOneByte-8 20000000 99.2 ns/op 10.08 MB/s 0 B/op 0 allocs/op

So, 10 MB/s, or 80 Mb/s. Not as fast as really good WiFi.

Let’s put this in context. What if we just use copy() to move data? Here’s another benchmark.

func BenchmarkCopy(b *testing.B) {
    from := make([]byte, b.N)
    to := make([]byte, b.N)    b.ReportAllocs()
    b.ResetTimer()
    b.SetBytes(1)    copy(to, from)
}

As you’d expect this runs much faster.

BenchmarkCopy-8 2000000000 1.56 ns/op 640.83 MB/s 0 B/op 0 allocs/op

Now that really wasn’t a fair fight. The channel moves the data one byte at a time to a separate goroutine, ensuring no byte is read before it’s sent, whereas the copy just ships the bytes around in memory, providing no synchronisation for a reader.

If you just want to move data quickly using channels then moving it 1 byte at a time is not sensible. What you really do with a channel is move ownership of the data, in which case the data rate can be effectively infinite, depending on the size of data block you transfer.

But we have worked out the basic cost of sending a thing on a channel (on a 2015 MBP): about 100 ns. We can probably send 10 million items in a second.

Can we find a fairer comparison? Well, channels are supposed to allow you to remove locks, so how do they compare against locks? We’ll just look at the simplest thing: locking and unlocking an uncontended mutex.

func BenchmarkSync(b *testing.B) {
    var s sync.Mutex    for i := 0; i < b.N; i++ {
        s.Lock()
        s.Unlock()
    }
}

As you’d hopefully expect, locking and unlocking a single uncontested mutex is faster than sending and receiving on a channel. Our results for this benchmark are BenchmarkSync-8 50000000 23.5 ns/op. So sending a byte through a channel is roughly equivalent to grabbing and releasing 4 locks.

Get Phil Pearl’s stories in your inbox

Join Medium for free to get updates from this writer.

But this still doesn’t implement any kind of communication between goroutines. We can use the sync package to build something that works like a channel ourselves. It’s a bit long, so I’ve put it on github here alongside the code for all these benchmarks and a few more. I’d like to think that this implementation is not dumb, but it doesn’t do anything particularly clever. On the other hand it doesn’t have to deal with generic types and select & all the other clever things that channels do.

My benchmark for this is almost exactly the same as the one for channels.

func BenchmarkSharedBuffer(b *testing.B) {
    sb := newSharedBuffer(4096)
    wg := sync.WaitGroup{}    wg.Add(1)    go func() {
        defer wg.Done()
        var closed bool        for !closed {
            _, closed = sb.get()
        }
    }()    b.SetBytes(1)
    b.ReportAllocs()
    b.ResetTimer()    for i := 0; i < b.N; i++ {
        sb.put(byte(i))
    }
    sb.close()
    wg.Wait()
}

My guess was that this would be faster than channels as it isn’t burdened by a lot of the other clever stuff channels can do. But there was little between them, with my simple implementation a tiny bit faster at 95ns per byte.

BenchmarkSharedBuffer-8           20000000         94.8 ns/op   10.55 MB/s

One thing that surprised me was that to get this performance I had to stop using defer in my implementation’s get() and put() routines. The overhead was something like 150ns per byte processed. So if you’re doing something performance critical watch out for defer within inner-loops.

Now, channels are supposed to be good at handing out work to multiple waiting goroutines, so lets look at that. We can quickly alter our benchmark to cover this case by starting more goroutines to read from the channel. I start as many goroutines as I have CPU cores: 8 on my 2015 MBP.

BenchmarkChannelOneByteMulti-8    10000000        230 ns/op    4.34 MB/s
BenchmarkSharedBufferMulti-8      20000000         94.5 ns/op   10.58 MB/s

Surprisingly the Go channel implementation performed worse, but the naive implementation performance barely changes. This is somewhat mysterious and bears further investigation.

So, we’ve learnt that channels

are slower than locking and unlocking a mutex (100ns versus 23ns)
are slower than copy() for just shifting bytes (100ns per byte versus 1.5ns)
in a simple scenario are about as fast as a naive home-grown channel implementation using the sync package
but surprisingly performance drops as you add more readers.

What does this tell us? I’m not exactly sure, but here are my thoughts on channels.

Don’t use channels unless you want to move information between goroutines. Don’t put stuff in channels just because you’re excited about using them. Callbacks and slices work better in a lot of cases.
Do use channels to move information between goroutines. If you can arrange your code so that work is moved to a goroutine that can then entirely own the data associated with a piece of work, then that’s likely to be a big win in terms of code clarity and performance.
Don’t use channels to implement locks. That’s silly.
When it makes sense to use channels don’t worry overmuch about their performance: they’re pretty good! Normally the work you will do per item will greatly exceed the 90–250 ns it takes to move the item through the channel, so it’s just not worth worrying about.
Channels perform worse than my naive implementation with multiple readers, but I don’t know why! My suspicion is that they have a clever trick for the single reader case, but the overhead of supporting generic values slows them down in general compared to my simple implementation. This requires further study!

By day, Phil uses his coding super-power to fight crime at ravelin.com. If you’ve enjoyed reading this article, please press the little heart button as Phil is powered mostly by internet points. Code for the benchmarks is here.