Channels under extreme web load aren’t “free” (and a Mutex often wins)

love Go’s concurrency story. Goroutines + channels are still one of the most productive ways to build services that feelconcurrent without drowning in callbacks.

But there’s a trap I’ve watched teams fall into (and I’ve done it myself): using channels as a “locking replacement” for shared state in hot paths.

Under extreme web load—think high QPS handlers doing tiny, repeated critical sections (counters, rate-limit state, in-memory maps, per-tenant stats)—channels often lose to sync.Mutex. Not because channels are “bad”, but because they do more work than a lock/unlock in the common uncontended-to-mildly-contended cases. In a classic benchmark, a channel send/receive comes in around ~100ns while an uncontended mutex lock/unlock can be ~4x faster.

Go’s own guidance is pragmatic here: use what’s simplest/most expressive; don’t be afraid to use a mutex when it fits.

Let’s prove it with code.

The scenario: a hot counter in an HTTP handler

Imagine you want to count requests per route (or per tenant), and you’re doing it in-process.

Two common approaches:

Channel-owned state (“share memory by communicating”): send increments to an aggregator goroutine.
Mutex-protected shared state: lock, update, unlock.

Here’s what those look like.

package stats

import "sync/atomic"

type ChannelCounter struct {
 ch chan string
 closed atomic.Bool
 // state lives in the goroutine below
}

func NewChannelCounter(buffer int) (*ChannelCounter, map[string]uint64) {
 cc := &ChannelCounter{ch: make(chan string, buffer)}
 counts := make(map[string]uint64)

 go func() {
 for key := range cc.ch {
 counts[key]++
 }
 }()

 return cc, counts
}

func (c *ChannelCounter) Inc(key string) {
 // Under heavy load, this send can become the dominant cost:
 // - contention on the channel
 // - scheduling/parking if buffer fills
 // - serialization through a single reader
 c.ch <- key
}

func (c *ChannelCounter) Close() {
 if c.closed.CompareAndSwap(false, true) {
 close(c.ch)
 }
}

What happens at high QPS?

Even if your HTTP server is handling requests across many goroutines, the aggregation is serialized through the one goroutine reading the channel.

Diagram (channel ownership)

This is elegant when:

the work per message is non-trivial
you want ownership semantics
you’re building a pipeline

…but for tiny critical sections (increment a counter), you’re paying channel costs to move a “do almost nothing” message around.

A good mental model: channels are a synchronization + queueing primitive. A mutex is just synchronization.

Option B: mutex around shared state (boring, fast)

package stats

import "sync"

type MutexCounter struct {
 mu sync.Mutex
 counts map[string]uint64
}

func NewMutexCounter() *MutexCounter {
 return &MutexCounter{counts: make(map[string]uint64)}
}

func (m *MutexCounter) Inc(key string) {
 m.mu.Lock()
 m.counts[key]++
 m.mu.Unlock()
}

func (m *MutexCounter) Snapshot() map[string]uint64 {
 m.mu.Lock()
 defer m.mu.Unlock()

 out := make(map[string]uint64, len(m.counts))
 for k, v := range m.counts {
 out[k] = v
 }
 return out
}

Diagram (mutex-protected shared memory)

Under contention, a mutex can still be painful—but for quick critical sections it’s often the least overhead path. And unlike the single channel reader, you’re not structurally serializing work through one goroutine; you’re only serializing the exact critical section.

Benchmarks: channel send/receive vs lock/unlock

1) Micro-benchmark: “just sync”

This uses the same idea as Phil Pearl’s measurement: channel ops do more work than lock/unlock in the simplest case.

package stats_test

import (
 "sync"
 "testing"
)

func BenchmarkMutexLockUnlock(b *testing.B) {
 var mu sync.Mutex
 for i := 0; i < b.N; i++ {
 mu.Lock()
 mu.Unlock()
 }
}

func BenchmarkChanSendReceive(b *testing.B) {
 ch := make(chan struct{}, 1024)

 var wg sync.WaitGroup
 wg.Add(1)
 go func() {
 defer wg.Done()
 for range ch {
 }
 }()

 b.ResetTimer()
 for i := 0; i < b.N; i++ {
 ch <- struct{}{}
 }
 close(ch)
 wg.Wait()
}

What you’ll usually observe

MutexLockUnlock is materially faster than ChanSendReceive for “do nothing but sync”
the gap grows when the channel starts blocking (buffer fills, reader lags, scheduler gets involved)

This aligns with the published channel-vs-mutex baseline numbers.\

2) “Web load” benchmark: many goroutines, tiny handler work

Below is a benchmark that simulates what a hot HTTP path does: lots of goroutines repeatedly calling Inc.

It’s not a full net/http benchmark (that adds network + parsing noise), but it is the shape that dominates once you’re optimizing: “how fast can I update shared in-memory stats?”

package stats_test

import (
 "runtime"
 "sync"
 "testing"

 "example.com/stats"
)

const key = "/v1/search"

func BenchmarkInc_MutexCounter(b *testing.B) {
 c := stats.NewMutexCounter()
 workers := runtime.GOMAXPROCS(0) * 4

 b.ResetTimer()
 b.RunParallel(func(pb *testing.PB) {
 for pb.Next() {
 c.Inc(key)
 }
 })
 _ = workers
}

func BenchmarkInc_ChannelCounter(b *testing.B) {
 cc, _ := stats.NewChannelCounter(4096)
 defer cc.Close()

 b.ResetTimer()
 b.RunParallel(func(pb *testing.PB) {
 for pb.Next() {
 cc.Inc(key)
 }
 })
}

Why this tends to favor `MutexCounter` at high load

Channel path = enqueue + potential block + scheduler coordination
Mutex path = lock + increment + unlock

When the per-request “real work” is small, the overhead becomes the story.

Also, the channel design introduces a hidden structural limit:

one goroutine is responsible for applying increments
if producers outrun it, the channel buffer fills
once it fills, producers block → tail latency spikes

That’s exactly what “extreme load” looks like: bursts + sustained pressure.

“But channels are idiomatic Go…”

They absolutely are—when you’re actually communicating.

Go’s official wiki basically says: don’t cargo-cult channels; use the simplest tool.

If the problem is:

“I have shared state and I need to protect it”
then sync.Mutex is often the cleanest expression of reality.