37c778ca9a
The per-process nonce cache breaks anti-replay under multi-node failover (audit 0004): a request captured on one node can be replayed to a DIFFERENT node whose local cache never saw the nonce, and is accepted. This makes the nonce state shared so a replay is rejected cluster-wide. pkg/membership: - nonceStore is now an interface. The in-memory cache is renamed memNonceCache (still the default, single-node behavior). - kvNonceStore (new) claims each nonce with an atomic KV Create on a shared bucket: first sight wins (accept), any later sight on any node rejects (replay). A backend error fails CLOSED (reject), so a KV outage never silently disables anti-replay. The bucket carries a TTL = nonceTTL (2*clockSkew) so a key expires exactly when its replay window closes; raw base64 nonces are mapped to KV-safe keys via sha256-hex. - Server.UseReplicatedNonces(js, replicas) swaps the store on a node; every node in a cluster calls it. NewServer still defaults to the in-memory cache (master behavior unchanged). Test (DoD error path — the issue's cross-node replay case): - TestReplicatedNonceRejectsCrossNodeReplay: two membershipd nodes share one KV bucket; a request accepted (200) on node A, replayed with the same ts+nonce to node B, is rejected (401) — and replaying to A again is rejected too.
52 lines
1.5 KiB
Go
52 lines
1.5 KiB
Go
package membership
|
|
|
|
import (
|
|
"strconv"
|
|
"testing"
|
|
"time"
|
|
)
|
|
|
|
// TestNonceCacheRememberPrune covers the replay/expiry behavior directly on the
|
|
// cache: a fresh nonce is accepted (golden), an immediate repeat is rejected
|
|
// (error), and after the TTL the same nonce is accepted again because its entry
|
|
// was pruned (edge).
|
|
func TestNonceCacheRememberPrune(t *testing.T) {
|
|
nc := newMemNonceCache(50*time.Millisecond, 1000)
|
|
base := time.Now()
|
|
|
|
if !nc.rememberOrReject("a", base) {
|
|
t.Fatalf("first sighting should be accepted")
|
|
}
|
|
if nc.rememberOrReject("a", base) {
|
|
t.Fatalf("an immediate replay should be rejected")
|
|
}
|
|
if !nc.rememberOrReject("a", base.Add(60*time.Millisecond)) {
|
|
t.Fatalf("after the TTL the nonce should be accepted again (pruned)")
|
|
}
|
|
}
|
|
|
|
// TestNonceCacheCapBounded covers the memory bound (audit H7): with a long TTL so
|
|
// nothing expires, inserting far more nonces than the cap must still keep the
|
|
// cache at or under the cap (oldest evicted), and the order queue must not drift
|
|
// from the map.
|
|
func TestNonceCacheCapBounded(t *testing.T) {
|
|
const capacity = 100
|
|
nc := newMemNonceCache(time.Hour, capacity)
|
|
base := time.Now()
|
|
for i := 0; i < 500; i++ {
|
|
nc.rememberOrReject("n"+strconv.Itoa(i), base)
|
|
}
|
|
|
|
nc.mu.Lock()
|
|
size := len(nc.seen)
|
|
orderLen := len(nc.order)
|
|
nc.mu.Unlock()
|
|
|
|
if size > capacity {
|
|
t.Fatalf("cache exceeded its cap: %d > %d", size, capacity)
|
|
}
|
|
if orderLen != size {
|
|
t.Fatalf("order queue drifted from the map: order=%d seen=%d", orderLen, size)
|
|
}
|
|
}
|