e7d59fd01d
The H1 fix bounds each request (1 MiB control / 16 MiB blob) and the per-IP rate
limiter throttles a single source, but neither bounds the AGGREGATE memory across
concurrent requests. The re-audit (report 0006, N2) drove RSS to ~1.42 GB with 40
concurrent 16 MiB uploads, and noted that a multi-IP (botnet) flood scales without
a ceiling because the rate limit is per-IP.
Fix: a global, non-blocking, byte-counting limiter (pkg/membership/inflight.go).
ServeHTTP reserves a POST's worst-case buffered size (its route ceiling) from the
limiter before reading the body, and releases it when the request finishes. When
the global cap (maxInflightBytes = 128 MiB) is reached, further POSTs are shed
with 503 (backpressure) rather than parking goroutines, so total bytes buffered
in flight stays bounded regardless of connection count or source-IP spread. GETs
carry no body and do not consume the budget.
The limiter is implemented inside unibus (not delegated to the fn-registry, where
a generic concurrency primitive would normally live) because functions/core pulls
transitive deps requiring CGO (mattn/go-sqlite3) and external modules that are
incompatible with unibus's CGO_ENABLED=0 build, and because this work is scoped
to the unibus sub-repo. The type/method comments document this.
Verification:
- pkg/membership/inflight_test.go: TestInflightLimiter{Basics,Disabled,Concurrent}
cover golden/edge/error/disabled/over-release and a -race concurrency invariant
(inFlight returns to 0, never exceeds cap).
- pkg/membership/dos_concurrency_test.go: TestReaudit_DoSConcurrency fires 40
concurrent 16 MiB uploads from distinct IPs (the multi-IP shape) against a 48 MiB
test cap -> 200=3 503=37, RSS delta ~93 MiB (bound 256 MiB), inFlight()==0, and a
fresh upload still 200. With the limiter disabled the test fails (200=40 503=0),
confirming it is a real regression guard.
- CGO_ENABLED=0 go build ./... && go vet ./... && go test -count=1 ./... green;
CGO_ENABLED=1 go test -race ./pkg/membership/ green.
Residual (documented): under enforce the body is buffered twice (auth verify +
handler), so real RSS is ~2x the reserved bytes; closing that fully means
streaming blobs to disk (overlaps H9 / issue 0002).
Refs: report 0006 N2, issue 0005c.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
86 lines
3.3 KiB
Go
86 lines
3.3 KiB
Go
package membership
|
|
|
|
import "sync/atomic"
|
|
|
|
// inflightLimiter is a non-blocking, byte-counting concurrency limiter: a global
|
|
// cap on how many bytes of request body the server will buffer simultaneously.
|
|
//
|
|
// The per-request body ceilings (maxControlBodyBytes / maxBlobBytes) bound a
|
|
// single request, and the per-IP rate limiter throttles a single source, but
|
|
// neither bounds the AGGREGATE memory across many concurrent uploads: the
|
|
// re-audit (report 0006, N2) showed 40 concurrent 16 MiB blob uploads driving
|
|
// RSS to ~1.42 GB, and a distributed (multi-IP) flood scales without a ceiling
|
|
// because the rate limiter is per-IP. This limiter is the missing aggregate
|
|
// bound: ServeHTTP reserves a request's worst-case buffered size before reading
|
|
// the body and releases it when the request finishes, so the total bytes in
|
|
// flight can never exceed max regardless of how many connections or source IPs
|
|
// arrive at once.
|
|
//
|
|
// It is intentionally NON-blocking: when a reservation does not fit, the caller
|
|
// sheds the request with backpressure (503) rather than parking a goroutine,
|
|
// which would let an attacker exhaust goroutines/connections instead of RAM. The
|
|
// counter is maintained with sync/atomic (a CAS loop), so it is safe for
|
|
// concurrent use without a mutex.
|
|
//
|
|
// Implementation note: this lives inside unibus rather than the fn-registry
|
|
// (where a generic concurrency primitive would normally belong) because the
|
|
// registry's functions/core package pulls in transitive dependencies that
|
|
// require CGO (mattn/go-sqlite3) and external modules, which are incompatible
|
|
// with unibus's CGO_ENABLED=0 build, and because this work is scoped to the
|
|
// unibus sub-repo.
|
|
type inflightLimiter struct {
|
|
max int64 // immutable after construction; <= 0 disables the limiter
|
|
used int64 // bytes currently reserved; accessed ONLY via sync/atomic
|
|
}
|
|
|
|
// newInflightLimiter builds a limiter with a cap of maxBytes bytes in flight.
|
|
// maxBytes <= 0 disables the cap (tryAcquire always grants), which is the
|
|
// loopback/dev posture where an aggregate memory ceiling is not wanted.
|
|
func newInflightLimiter(maxBytes int64) *inflightLimiter {
|
|
return &inflightLimiter{max: maxBytes}
|
|
}
|
|
|
|
// tryAcquire reserves n bytes without blocking. It returns true and reserves the
|
|
// bytes when they fit within the cap (used+n <= max), or false (reserving
|
|
// nothing) when they do not. n <= 0 is granted without reserving, and a disabled
|
|
// limiter (max <= 0) always grants. Safe for concurrent use.
|
|
func (l *inflightLimiter) tryAcquire(n int64) bool {
|
|
if l.max <= 0 || n <= 0 {
|
|
return true
|
|
}
|
|
for {
|
|
cur := atomic.LoadInt64(&l.used)
|
|
if cur+n > l.max {
|
|
return false
|
|
}
|
|
if atomic.CompareAndSwapInt64(&l.used, cur, cur+n) {
|
|
return true
|
|
}
|
|
}
|
|
}
|
|
|
|
// release returns n previously reserved bytes. It must be paired with a
|
|
// tryAcquire that granted. A disabled limiter or n <= 0 is a no-op. The counter
|
|
// never drops below zero (a defensive clamp against an accidental double release).
|
|
func (l *inflightLimiter) release(n int64) {
|
|
if l.max <= 0 || n <= 0 {
|
|
return
|
|
}
|
|
for {
|
|
cur := atomic.LoadInt64(&l.used)
|
|
nv := cur - n
|
|
if nv < 0 {
|
|
nv = 0
|
|
}
|
|
if atomic.CompareAndSwapInt64(&l.used, cur, nv) {
|
|
return
|
|
}
|
|
}
|
|
}
|
|
|
|
// inFlight returns the bytes currently reserved. It is observability for tests
|
|
// and metrics.
|
|
func (l *inflightLimiter) inFlight() int64 {
|
|
return atomic.LoadInt64(&l.used)
|
|
}
|