feat(0003e/2): replicated anti-replay nonce store on JetStream KV
The per-process nonce cache breaks anti-replay under multi-node failover (audit 0004): a request captured on one node can be replayed to a DIFFERENT node whose local cache never saw the nonce, and is accepted. This makes the nonce state shared so a replay is rejected cluster-wide. pkg/membership: - nonceStore is now an interface. The in-memory cache is renamed memNonceCache (still the default, single-node behavior). - kvNonceStore (new) claims each nonce with an atomic KV Create on a shared bucket: first sight wins (accept), any later sight on any node rejects (replay). A backend error fails CLOSED (reject), so a KV outage never silently disables anti-replay. The bucket carries a TTL = nonceTTL (2*clockSkew) so a key expires exactly when its replay window closes; raw base64 nonces are mapped to KV-safe keys via sha256-hex. - Server.UseReplicatedNonces(js, replicas) swaps the store on a node; every node in a cluster calls it. NewServer still defaults to the in-memory cache (master behavior unchanged). Test (DoD error path — the issue's cross-node replay case): - TestReplicatedNonceRejectsCrossNodeReplay: two membershipd nodes share one KV bucket; a request accepted (200) on node A, replayed with the same ts+nonce to node B, is rejected (401) — and replaying to A again is rejected too.
This commit is contained in:
@@ -19,6 +19,7 @@ import (
|
||||
|
||||
"github.com/enmanuel/unibus/pkg/blobstore"
|
||||
"github.com/enmanuel/unibus/pkg/frame"
|
||||
"github.com/nats-io/nats.go/jetstream"
|
||||
)
|
||||
|
||||
// Body-size ceilings for the control plane. They bound how much an unauthenticated
|
||||
@@ -59,7 +60,7 @@ type Server struct {
|
||||
blobs blobstore.Store
|
||||
mux *http.ServeMux
|
||||
authMode AuthMode
|
||||
nonces *nonceCache
|
||||
nonces nonceStore
|
||||
limiter *ipRateLimiter
|
||||
|
||||
// RequireEncryptedRooms, when true, refuses to create cleartext (ModeNATS)
|
||||
@@ -84,13 +85,29 @@ func NewServer(store Store, blobs blobstore.Store, authMode AuthMode) *Server {
|
||||
blobs: blobs,
|
||||
mux: http.NewServeMux(),
|
||||
authMode: authMode,
|
||||
nonces: newNonceCache(nonceTTL, maxNonceCacheEntries),
|
||||
nonces: newMemNonceCache(nonceTTL, maxNonceCacheEntries),
|
||||
limiter: newIPRateLimiter(defaultRatePerSec, defaultRateBurst, rateBucketTTL),
|
||||
}
|
||||
s.routes()
|
||||
return s
|
||||
}
|
||||
|
||||
// UseReplicatedNonces switches the server's anti-replay store from the
|
||||
// per-process in-memory cache to a JetStream KV bucket shared across the cluster
|
||||
// (issue 0003e). It MUST be called on every node of a multi-node deployment:
|
||||
// otherwise a request captured on one node can be replayed to another whose
|
||||
// local cache never saw the nonce. replicas is the bucket's replication factor
|
||||
// (R1..R3). The TTL matches the in-memory cache (nonceTTL = 2*clockSkew), so a
|
||||
// replay can never outlive its memory.
|
||||
func (s *Server) UseReplicatedNonces(js jetstream.JetStream, replicas int) error {
|
||||
ns, err := newKVNonceStore(js, nonceTTL, replicas, 0)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
s.nonces = ns
|
||||
return nil
|
||||
}
|
||||
|
||||
// ServeHTTP satisfies http.Handler. It runs the control-plane auth middleware
|
||||
// (signature verification + anti-replay + allowlist) ahead of the router
|
||||
// according to authMode, then dispatches to the matched handler.
|
||||
|
||||
Reference in New Issue
Block a user