feat(0003e/2): replicated anti-replay nonce store on JetStream KV
The per-process nonce cache breaks anti-replay under multi-node failover (audit 0004): a request captured on one node can be replayed to a DIFFERENT node whose local cache never saw the nonce, and is accepted. This makes the nonce state shared so a replay is rejected cluster-wide. pkg/membership: - nonceStore is now an interface. The in-memory cache is renamed memNonceCache (still the default, single-node behavior). - kvNonceStore (new) claims each nonce with an atomic KV Create on a shared bucket: first sight wins (accept), any later sight on any node rejects (replay). A backend error fails CLOSED (reject), so a KV outage never silently disables anti-replay. The bucket carries a TTL = nonceTTL (2*clockSkew) so a key expires exactly when its replay window closes; raw base64 nonces are mapped to KV-safe keys via sha256-hex. - Server.UseReplicatedNonces(js, replicas) swaps the store on a node; every node in a cluster calls it. NewServer still defaults to the in-memory cache (master behavior unchanged). Test (DoD error path — the issue's cross-node replay case): - TestReplicatedNonceRejectsCrossNodeReplay: two membershipd nodes share one KV bucket; a request accepted (200) on node A, replayed with the same ts+nonce to node B, is rejected (401) — and replaying to A again is rejected too.
This commit is contained in:
+19
-8
@@ -95,16 +95,27 @@ func CanonicalRequest(method, path, ts, nonce string, body []byte) []byte {
|
||||
return []byte(method + "\n" + path + "\n" + ts + "\n" + nonce + "\n" + hex.EncodeToString(sum[:]))
|
||||
}
|
||||
|
||||
// nonceCache remembers recently-seen nonces to reject replays. It is an
|
||||
// in-memory store guarded by a mutex — sufficient for a single membershipd
|
||||
// process (the spec's chosen tradeoff over a server-issued nonce round-trip). A
|
||||
// distributed deployment would need a shared store (tracked for issue 0003).
|
||||
// nonceStore is the anti-replay backend: rememberOrReject records a nonce and
|
||||
// reports whether it was unseen (true -> accept) or already seen (false ->
|
||||
// reject the replay). It is an interface (issue 0003e) so the single-node
|
||||
// in-memory cache can be swapped for a replicated KV store: a per-process cache
|
||||
// is BROKEN under multi-node failover (a request captured and replayed to a
|
||||
// DIFFERENT node whose cache never saw the nonce would be accepted), so a
|
||||
// cluster MUST share the nonce state. Every implementation fails CLOSED — a
|
||||
// backend it cannot reach rejects rather than admits.
|
||||
type nonceStore interface {
|
||||
rememberOrReject(nonce string, now time.Time) bool
|
||||
}
|
||||
|
||||
// memNonceCache remembers recently-seen nonces to reject replays. It is an
|
||||
// in-memory store guarded by a mutex — sufficient for a SINGLE membershipd
|
||||
// process. A clustered deployment uses kvNonceStore instead (issue 0003e).
|
||||
//
|
||||
// Pruning is O(expired), not O(n): because the TTL is constant, insertion order
|
||||
// equals expiry order, so the oldest entries (front of `order`) are exactly the
|
||||
// ones that expire first (audit H7 — the previous full-map scan under the mutex
|
||||
// was a CPU-amplification vector). A size cap bounds memory.
|
||||
type nonceCache struct {
|
||||
type memNonceCache struct {
|
||||
mu sync.Mutex
|
||||
seen map[string]time.Time // nonce -> expiry
|
||||
order []string // nonces in insertion order == expiry order
|
||||
@@ -112,13 +123,13 @@ type nonceCache struct {
|
||||
cap int
|
||||
}
|
||||
|
||||
func newNonceCache(ttl time.Duration, capacity int) *nonceCache {
|
||||
return &nonceCache{seen: make(map[string]time.Time), ttl: ttl, cap: capacity}
|
||||
func newMemNonceCache(ttl time.Duration, capacity int) *memNonceCache {
|
||||
return &memNonceCache{seen: make(map[string]time.Time), ttl: ttl, cap: capacity}
|
||||
}
|
||||
|
||||
// rememberOrReject records nonce and returns true if it was unseen, or false if
|
||||
// it is a replay (still live in the cache).
|
||||
func (n *nonceCache) rememberOrReject(nonce string, now time.Time) bool {
|
||||
func (n *memNonceCache) rememberOrReject(nonce string, now time.Time) bool {
|
||||
n.mu.Lock()
|
||||
defer n.mu.Unlock()
|
||||
|
||||
|
||||
Reference in New Issue
Block a user