love Go’s concurrency story. Goroutines + channels are still one of the most productive ways to build services that feelconcurrent without drowning in callbacks.
But there’s a trap I’ve watched teams fall into (and I’ve done it myself): using channels as a “locking replacement” for shared state in hot paths.
Under extreme web load—think high QPS handlers doing tiny, repeated critical sections (counters, rate-limit state, in-memory maps, per-tenant stats)—channels often lose to sync.Mutex. Not because channels are “bad”, but because they do more work than a lock/unlock in the common uncontended-to-mildly-contended cases. In a classic benchmark, a channel send/receive comes in around ~100ns while an uncontended mutex lock/unlock can be ~4x faster.
Go’s own guidance is pragmatic here: use what’s simplest/most expressive; don’t be afraid to use a mutex when it fits.
Let’s prove it with code.
The scenario: a hot counter in an HTTP handler
Imagine you want to count requests per route (or per tenant), and you’re doing it in-process.
Two common approaches:
-
Channel-owned state (“share memory by communicating”): send increments to an aggregator goroutine.
-
Mutex-protected shared state: lock, update, unlock.
Here’s what those look like.
package stats
import "sync/atomic"
type ChannelCounter struct {
ch chan string
closed atomic.Bool
// state lives in the goroutine below
}
func NewChannelCounter(buffer int) (*ChannelCounter, map[string]uint64) {
cc := &ChannelCounter{ch: make(chan string, buffer)}
counts := make(map[string]uint64)
go func() {
for key := range cc.ch {
counts[key]++
}
}()
return cc, counts
}
func (c *ChannelCounter) Inc(key string) {
// Under heavy load, this send can become the dominant cost:
// - contention on the channel
// - scheduling/parking if buffer fills
// - serialization through a single reader
c.ch <- key
}
func (c *ChannelCounter) Close() {
if c.closed.CompareAndSwap(false, true) {
close(c.ch)
}
}
What happens at high QPS?
Even if your HTTP server is handling requests across many goroutines, the aggregation is serialized through the one goroutine reading the channel.
Diagram (channel ownership)
This is elegant when:
-
the work per message is non-trivial
-
you want ownership semantics
-
you’re building a pipeline
…but for tiny critical sections (increment a counter), you’re paying channel costs to move a “do almost nothing” message around.
A good mental model: channels are a synchronization + queueing primitive. A mutex is just synchronization.
Option B: mutex around shared state (boring, fast)
package stats
import "sync"
type MutexCounter struct {
mu sync.Mutex
counts map[string]uint64
}
func NewMutexCounter() *MutexCounter {
return &MutexCounter{counts: make(map[string]uint64)}
}
func (m *MutexCounter) Inc(key string) {
m.mu.Lock()
m.counts[key]++
m.mu.Unlock()
}
func (m *MutexCounter) Snapshot() map[string]uint64 {
m.mu.Lock()
defer m.mu.Unlock()
out := make(map[string]uint64, len(m.counts))
for k, v := range m.counts {
out[k] = v
}
return out
}
Diagram (mutex-protected shared memory)
Under contention, a mutex can still be painful—but for quick critical sections it’s often the least overhead path. And unlike the single channel reader, you’re not structurally serializing work through one goroutine; you’re only serializing the exact critical section.
Benchmarks: channel send/receive vs lock/unlock
1) Micro-benchmark: “just sync”
This uses the same idea as Phil Pearl’s measurement: channel ops do more work than lock/unlock in the simplest case.
package stats_test
import (
"sync"
"testing"
)
func BenchmarkMutexLockUnlock(b *testing.B) {
var mu sync.Mutex
for i := 0; i < b.N; i++ {
mu.Lock()
mu.Unlock()
}
}
func BenchmarkChanSendReceive(b *testing.B) {
ch := make(chan struct{}, 1024)
var wg sync.WaitGroup
wg.Add(1)
go func() {
defer wg.Done()
for range ch {
}
}()
b.ResetTimer()
for i := 0; i < b.N; i++ {
ch <- struct{}{}
}
close(ch)
wg.Wait()
}
What you’ll usually observe
-
MutexLockUnlockis materially faster thanChanSendReceivefor “do nothing but sync” -
the gap grows when the channel starts blocking (buffer fills, reader lags, scheduler gets involved)
This aligns with the published channel-vs-mutex baseline numbers.\
2) “Web load” benchmark: many goroutines, tiny handler work
Below is a benchmark that simulates what a hot HTTP path does: lots of goroutines repeatedly calling Inc.
It’s not a full net/http benchmark (that adds network + parsing noise), but it is the shape that dominates once you’re optimizing: “how fast can I update shared in-memory stats?”
package stats_test
import (
"runtime"
"sync"
"testing"
"example.com/stats"
)
const key = "/v1/search"
func BenchmarkInc_MutexCounter(b *testing.B) {
c := stats.NewMutexCounter()
workers := runtime.GOMAXPROCS(0) * 4
b.ResetTimer()
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
c.Inc(key)
}
})
_ = workers
}
func BenchmarkInc_ChannelCounter(b *testing.B) {
cc, _ := stats.NewChannelCounter(4096)
defer cc.Close()
b.ResetTimer()
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
cc.Inc(key)
}
})
}
Why this tends to favor MutexCounter at high load
-
Channel path = enqueue + potential block + scheduler coordination
-
Mutex path = lock + increment + unlock
When the per-request “real work” is small, the overhead becomes the story.
Also, the channel design introduces a hidden structural limit:
-
one goroutine is responsible for applying increments
-
if producers outrun it, the channel buffer fills
-
once it fills, producers block → tail latency spikes
That’s exactly what “extreme load” looks like: bursts + sustained pressure.
“But channels are idiomatic Go…”
They absolutely are—when you’re actually communicating.
Go’s official wiki basically says: don’t cargo-cult channels; use the simplest tool.
If the problem is:
-
“I have shared state and I need to protect it”
thensync.Mutexis often the cleanest expression of reality.
If the problem is:
-
“I want to transfer ownership of work/state and build a pipeline”
channels can be perfect.
And if the problem is:
-
“I need insane throughput increments”
thensync/atomicor sharded counters may be the real answer (but that’s a different post).
Practical takeaways for high-QPS services
-
Don’t use channels to implement locks (it’s usually slower and can bottleneck).
-
For hot-path shared state (stats, caches, maps), start with
sync.Mutex. -
If you choose channels, do it for a reason: ownership, pipeline, backpressure semantics.
-
If you see tail latency spikes under load and you’re using a single “state goroutine”, inspect whether your channel is filling and causing handler goroutines to block.
If you want, I can also add:
-
a sharded-mutex version (
[N]struct{ mu; map }) to show how it scales under contention -
a full
net/http+httptest+ load-generator harness (so you can runwrk/heyand see latency curves)

