A cluster is only as secure as its weakest node: the data plane forwards every
subject between nodes, so one node running without enforced auth lets an
unauthenticated peer Subscribe(">") on it and harvest the traffic forwarded from
the ACL'd nodes.
- validateClusterConfig now takes the auth mode and REFUSES to join a cluster
unless --bus-auth enforce, regardless of bind (a clustered node is a production
node; there is no safe dev cluster without auth). This binary therefore cannot
BE the weak node.
- Server.Posture {enforce,acl,tls,cluster,store} is published on /healthz (non
secret operational metadata, probe stays unauthenticated) so a monitor or peer
can detect a cluster member not running enforce+ACL+TLS — covering a peer that
runs a tampered/old binary outside this node's control.
Tests:
- TestAttack0008_N1: a clustered node with --bus-auth off is refused; the same
node with enforce + full route security is allowed.
- TestClusterConfigPolicy: extended with off/soft clustered cases (refused) and
the mode parameter throughout.
- TestHealthExposesPosture: /healthz returns the posture booleans + store backend.
CGO_ENABLED=0 go build/vet/test green; govulncheck 0 reachable.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The H1 fix bounds each request (1 MiB control / 16 MiB blob) and the per-IP rate
limiter throttles a single source, but neither bounds the AGGREGATE memory across
concurrent requests. The re-audit (report 0006, N2) drove RSS to ~1.42 GB with 40
concurrent 16 MiB uploads, and noted that a multi-IP (botnet) flood scales without
a ceiling because the rate limit is per-IP.
Fix: a global, non-blocking, byte-counting limiter (pkg/membership/inflight.go).
ServeHTTP reserves a POST's worst-case buffered size (its route ceiling) from the
limiter before reading the body, and releases it when the request finishes. When
the global cap (maxInflightBytes = 128 MiB) is reached, further POSTs are shed
with 503 (backpressure) rather than parking goroutines, so total bytes buffered
in flight stays bounded regardless of connection count or source-IP spread. GETs
carry no body and do not consume the budget.
The limiter is implemented inside unibus (not delegated to the fn-registry, where
a generic concurrency primitive would normally live) because functions/core pulls
transitive deps requiring CGO (mattn/go-sqlite3) and external modules that are
incompatible with unibus's CGO_ENABLED=0 build, and because this work is scoped
to the unibus sub-repo. The type/method comments document this.
Verification:
- pkg/membership/inflight_test.go: TestInflightLimiter{Basics,Disabled,Concurrent}
cover golden/edge/error/disabled/over-release and a -race concurrency invariant
(inFlight returns to 0, never exceeds cap).
- pkg/membership/dos_concurrency_test.go: TestReaudit_DoSConcurrency fires 40
concurrent 16 MiB uploads from distinct IPs (the multi-IP shape) against a 48 MiB
test cap -> 200=3 503=37, RSS delta ~93 MiB (bound 256 MiB), inFlight()==0, and a
fresh upload still 200. With the limiter disabled the test fails (200=40 503=0),
confirming it is a real regression guard.
- CGO_ENABLED=0 go build ./... && go vet ./... && go test -count=1 ./... green;
CGO_ENABLED=1 go test -race ./pkg/membership/ green.
Residual (documented): under enforce the body is buffered twice (auth verify +
handler), so real RSS is ~2x the reserved bytes; closing that fully means
streaming blobs to disk (overlaps H9 / issue 0002).
Refs: report 0006 N2, issue 0005c.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The per-process nonce cache breaks anti-replay under multi-node failover
(audit 0004): a request captured on one node can be replayed to a
DIFFERENT node whose local cache never saw the nonce, and is accepted.
This makes the nonce state shared so a replay is rejected cluster-wide.
pkg/membership:
- nonceStore is now an interface. The in-memory cache is renamed
memNonceCache (still the default, single-node behavior).
- kvNonceStore (new) claims each nonce with an atomic KV Create on a
shared bucket: first sight wins (accept), any later sight on any node
rejects (replay). A backend error fails CLOSED (reject), so a KV outage
never silently disables anti-replay. The bucket carries a TTL =
nonceTTL (2*clockSkew) so a key expires exactly when its replay window
closes; raw base64 nonces are mapped to KV-safe keys via sha256-hex.
- Server.UseReplicatedNonces(js, replicas) swaps the store on a node;
every node in a cluster calls it. NewServer still defaults to the
in-memory cache (master behavior unchanged).
Test (DoD error path — the issue's cross-node replay case):
- TestReplicatedNonceRejectsCrossNodeReplay: two membershipd nodes share
one KV bucket; a request accepted (200) on node A, replayed with the
same ts+nonce to node B, is rejected (401) — and replaying to A again
is rejected too.
Branch-by-abstraction for the blob store (issue 0003d): media ciphertext
can live in a replicated JetStream Object Store instead of local disk, so
a blob uploaded to one node survives a node loss and is reachable from
any node.
pkg/blobstore:
- Store is now an interface (Put/Get/Has). The filesystem backend is
renamed diskStore and stays the default: New(dir) returns it.
- objectStore (new) implements Store over a NATS Object Store bucket with
a configurable replication factor (R1..R5), matching the KV store's
R1->R3 rollout. Content-addressing (sha256-hex) is identical, so the
wire contract is unchanged.
pkg/membership:
- Server.blobs and NewServer take the blobstore.Store interface instead
of the concrete type; no behavior change with the disk default.
Tests (DoD: golden + edge + contract):
- TestObjectStoreRoundTrip: put/get/has + content-addressed dedup.
- TestObjectStoreMissing: unknown hash is absent and unreadable.
- TestObjectStoreAddressMatchesDisk: the Object Store and disk backends
address identical bytes to the IDENTICAL hash (portable blob refs).
Like the KV store (0003b), wiring membershipd to select the Object Store
is deferred to the decentralized boot path (flag off); disk stays default.
Branch-by-abstraction for the control-plane store (issue 0003b), so the
membership state can move off process-local SQLite onto replicated
JetStream KV without rewriting callers and without breaking master.
pkg/membership:
- Store is now an interface (rooms/members/keys + user allowlist +
Close). The existing SQLite implementation is renamed sqliteStore and
stays the default: Open(path) still returns it. openSQLite keeps the
concrete type for internal callers (the 0003c migration).
- ErrNotFound is a storage-agnostic "no such record" sentinel; both
backends return it (the SQLite store maps sql.ErrNoRows to it). The
control plane now branches on ErrNotFound instead of sql.ErrNoRows, so
server.go no longer imports database/sql.
- jetstreamStore (new) implements Store over five replicated KV buckets:
rooms, members, rooms_by_member (reverse index for ListRoomsForEndpoint),
room_keys, users. Replication factor is configurable (R1..R5) for the
R1->R3 rollout. Every read is bounded by OpTimeout and IsAuthorized /
HasAdmin FAIL CLOSED on any backend error (a KV quorum loss denies,
never admits), per the audit's requirement for the decentralized store.
dev/feature_flags.json:
- Add the `decentralized` flag (OFF): sqliteStore default while off,
jetstreamStore behind it. The membershipd boot wiring that selects the
KV store is deliberately deferred to 0003e/0003f (the embedded-NATS
authenticator<->store bootstrap is part of the session/deploy redesign);
OFF keeps the single-node SQLite control plane unchanged.
Tests (DoD: golden + edges + error path):
- TestJetStreamStoreRoomsCRUD: encrypted room + owner + invited member
round-trip through every room/member/key method, including latest-epoch
resolution and rekey.
- TestJetStreamStoreUsers: add/get/authorize/list/revoke + admin gate,
with case-insensitive key normalization and duplicate rejection.
- TestJetStreamStoreNotFound: ErrNotFound mapping for misses.
- TestJetStreamStoreIsAuthorizedFailClosed: NATS backend shut down ->
IsAuthorized and HasAdmin both DENY within the bounded timeout.
The full existing suite stays green: sqliteStore is unchanged behavior.
Three medium audit findings.
H6 (owner spoof): handleCreateRoom now binds the body's declared owner to the
authenticated signer — both the endpoint id and the signing key must be the
signer's — so a registered peer cannot create rooms in another identity's name.
Enforced only when an authenticated signer is present.
H7 (nonce-cache poison pre-auth): IsAuthorized now runs BEFORE the replay cache
is touched, so an unregistered identity (Ed25519 keys are free) can no longer
seed nonces into it. The cache is rewritten with O(expired) pruning (insertion
order equals expiry order under a constant TTL) instead of the previous O(n)
full-map scan under the mutex, plus a size cap with oldest-eviction. This is the
prerequisite the 0003 replicated nonce store builds on.
H12 (error leak): internal store/blob errors are logged and replaced with a
generic client message via writeServerErr, so SQL fragments and filesystem paths
no longer reach the caller. Crafted 4xx messages (owner-sig, validation) are kept.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Audit H4 (Alto). The embedded NATS has a single account with no per-subject
permissions, so any registered peer can subscribe to any subject — a cleartext
(ModeNATS) room's payload is readable by anyone who knows the subject.
A complete per-subject ACL derived from membership does not fit here: NATS
evaluates a connection's permissions once at connect time and never re-evaluates
them, but unibus clients connect-then-create/join-then-publish on one connection
(TestSecureBusEndToEnd). Static permissions would forbid the owner from
publishing to a room it just created; the dynamic reconnection model belongs to
the 0003 decentralization redesign. See dev/0004d-dataplane-acl.md.
Minimum defense implemented: Server.RequireEncryptedRooms (set by membershipd on
any non-loopback bind) refuses to create cleartext rooms, so every room on a
public deployment is end-to-end encrypted. Message CONTENT stays confidential
even with no subject isolation; residual traffic-metadata exposure is documented
and tracked for 0003.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Audit H3 (Alto). 'Authorized' meant 'registered in the allowlist', not 'member
of the room', so any registered peer could read another room's subject, its
full member list (every member's sign_pub + kex_pub), any endpoint's room
directory, and even another member's sealed key.
The middleware now carries the authenticated signer's endpoint id into the
handler via request context. Room handlers enforce membership:
- GET /rooms/{id} and /rooms/{id}/members require the signer to be a member;
- GET /rooms/{id}/key serves the sealed key only to its own endpoint
(endpoint == signer) and only to a member;
- GET /members/{endpoint}/rooms is restricted to the signer's own endpoint.
Authorization is skipped only when no authenticated signer is present (AuthOff
dev, or a soft-mode pass-through), preserving legacy/dev behavior. Internal
errors no longer echo store messages to the client on these paths.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pre-auth DoS hardening (audit H1, Critical). The control-plane middleware
read the request body with io.ReadAll before authenticating and with no size
cap, so an unauthenticated peer could force the server to buffer an arbitrary
body in RAM (the auditor sent 400 MB and watched RSS climb to ~898 MB).
- ServeHTTP now caps the buffered body before reading: a per-route ceiling
(1 MiB JSON, 16 MiB /blobs) rejects an over-declared Content-Length outright
and wraps the body in http.MaxBytesReader so a lying/chunked sender trips at
the ceiling instead of unbounded.
- handlePutBlob maps the MaxBytesReader cutoff to 413 in every auth mode.
- Per-IP token-bucket rate limiter (golang.org/x/time/rate, already in the
module graph) sheds floods before auth or body reads. Loopback dev stacks are
unaffected (burst >> any single client's rate). Kept in-package as transport
glue, not promoted to the registry, mirroring the nonceCache decision in 0003.
- membershipd sets http.Server.MaxHeaderBytes and ReadHeaderTimeout.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds the bus-auth rollout (off|soft|enforce) to the control plane. The
middleware verifies an Ed25519 request signature over CanonicalRequest
(method, request-URI, ts, nonce, sha256(body)), checks the timestamp is
within +/-30s, rejects replayed nonces via an in-memory TTL cache (60s), and
requires the signer to be an active user in the allowlist. soft logs
rejections but lets requests through so clients can migrate without an
outage; off is the legacy no-op default. /healthz is exempt so health probes
work before any identity exists. CanonicalRequest is exported as the single
source of truth shared with the client.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A peer invited to an encrypted room needs to find it: the control plane is
pull-based (no server push of invitations), so add a discovery endpoint that
lists every room an endpoint belongs to, with the room's metadata and the
endpoint's role.
- store.ListRoomsForEndpoint: JOIN members+rooms, ordered by room id, empty
slice (not error) for an endpoint in no rooms.
- membershipd: GET /members/{endpoint}/rooms returns {room_id, subject, epoch,
policy, role}[].
- client.ListMyRooms + RoomRef: a bot polls this to discover and then Join +
Subscribe rooms it was invited to.
Tests: store-level (owner in N rooms, member in one, unknown endpoint → []) and
client-level e2e through the embedded harness (B discovers a room A invited it
to, without prior knowledge of the room id; owner sees role=owner).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- membership server returns 403 + human-readable message on missing sealed key (was leaking 'sql: no rows in result set')
- client doJSON unwraps the server's {"error"} field instead of pasting the raw HTTP envelope