fix(0006a): wire replicated nonce store on clustered nodes (audit 0008 N3)

membershipd never called Server.UseReplicatedNonces, so every node kept a
per-process anti-replay cache and a signed request accepted on node A could be
replayed to node B (200+200). This wires the shared JetStream KV nonce bucket on
any clustered node, closing the cross-node replay hole.

Bootstrap: under enforce the service needs JetStream on its own embedded server,
but the data plane only accepts allowlisted clients. Resolved with an ephemeral
internal service identity the authenticator recognizes and grants full
permissions (NewNkeyAuthenticatorACLInternal), connected over the in-process
transport (no TLS/CA needed for the self-connection).

Hard rule: --cluster-name != "" means the replicated nonce bucket is mandatory;
if it cannot be created the node refuses to start (wireReplicatedNonces returns a
fatal error) rather than run insecurely. Standalone nodes keep the in-memory
cache unchanged (branch-by-abstraction: no JetStream dependency added).

Changes:
- busauth: NewNkeyAuthenticatorACLInternal + fullPermissions for the internal id.
- cmd/membershipd: connectInternalJS (in-process, privileged) / connectExternalJS;
  wireReplicatedNonces helper; main wires it when clustered; --kv-replicas flag.

Tests (regression of audit 0008 N3):
- TestAttack0008_N3: 2 clustered nodes share the bucket, cross-node replay -> 401.
- TestAttack0008_N3_StandaloneKeepsLocalCache: standalone needs no JetStream,
  same-node replay still 401.
- TestAttack0008_N3_ClusteredRequiresJetStream: clustered + no JetStream -> fatal.
- TestInternalConnPrivilegedUnderEnforce / ...OutsiderRejected: the privileged
  self-connection works under enforce and no other identity can claim it.

CGO_ENABLED=0 go build/vet/test green; govulncheck 0 reachable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-07 17:02:19 +02:00
parent 5df99fa4c4
commit 8b6a01d280
6 changed files with 570 additions and 1 deletions
+40
View File
@@ -82,6 +82,15 @@ type PermissionsFunc func(signPubHex string) (*server.Permissions, error)
type nkeyAuthenticatorACL struct {
isAuthorized func(signPubHex string) bool
perms PermissionsFunc
// internalPubHex is the lowercase-hex Ed25519 public key of membershipd's own
// ephemeral internal service identity. A connection that proves that key is
// granted full permissions WITHOUT consulting the allowlist, so the service can
// bootstrap and manage JetStream (the replicated nonce bucket and, when
// decentralized, the control-plane KV buckets) against its own embedded server
// even while the data plane confines every client to its rooms. Empty disables
// the internal-identity path entirely (behavior identical to a plain ACL
// authenticator).
internalPubHex string
}
// NewNkeyAuthenticatorACL builds an authenticator that authorizes by the bus
@@ -94,6 +103,29 @@ func NewNkeyAuthenticatorACL(isAuthorized func(signPubHex string) bool, perms Pe
return &nkeyAuthenticatorACL{isAuthorized: isAuthorized, perms: perms}
}
// NewNkeyAuthenticatorACLInternal is NewNkeyAuthenticatorACL that also recognizes
// membershipd's internal service identity (internalPubHex, the lowercase hex of
// its ephemeral Ed25519 public key): a connection proving that key is granted
// full permissions without an allowlist lookup, so the service can create and
// manage JetStream against its own embedded server under enforce (issue 0006a/c —
// the replicated nonce bucket and the control-plane KV). Every other identity
// goes through the allowlist + per-subject ACL unchanged. An empty internalPubHex
// is identical to NewNkeyAuthenticatorACL, so this is a superset and safe to use
// everywhere the plain constructor was used.
func NewNkeyAuthenticatorACLInternal(isAuthorized func(signPubHex string) bool, perms PermissionsFunc, internalPubHex string) server.Authentication {
return &nkeyAuthenticatorACL{isAuthorized: isAuthorized, perms: perms, internalPubHex: internalPubHex}
}
// fullPermissions grants publish and subscribe on every subject (">"). It is the
// permission set for membershipd's own internal service connection, which must
// manage the JetStream control plane (nonce bucket + KV buckets) over NATS. It is
// NEVER granted to a bus user — only to the process's own ephemeral internal
// identity, recognized by exact public-key match in Check.
func fullPermissions() *server.Permissions {
sp := &server.SubjectPermission{Allow: []string{">"}}
return &server.Permissions{Publish: sp, Subscribe: sp}
}
// Check verifies the nkey, authorizes against the allowlist, then derives and
// registers the connection's subject permissions. A permissions-derivation
// error denies the connection (fail closed) rather than granting open access.
@@ -102,6 +134,14 @@ func (a *nkeyAuthenticatorACL) Check(c server.ClientAuthentication) bool {
if !ok {
return false
}
// membershipd's own internal service identity bypasses the allowlist and is
// granted full permissions so the service can bootstrap JetStream under
// enforce. The key is matched exactly against the cryptographically verified
// connecting key, so no other identity can claim these permissions.
if a.internalPubHex != "" && signPubHex == a.internalPubHex {
c.RegisterUser(&server.User{Permissions: fullPermissions()})
return true
}
if !a.isAuthorized(signPubHex) {
return false
}