Low-severity cluster hardening from audit 0008:
- Route secret out of argv (N1-low): --cluster-pass and a nats://user:pass@host in
--routes are visible in ps/journald. New --cluster-pass-file and the
UNIBUS_CLUSTER_PASS env var (precedence file > env > flag); the resolved secret
guards the route layer and is injected into bare --routes entries
(injectRouteCreds), so peers can be listed as nats://host:6250 with no secret in
argv. The legacy --cluster-pass stays for dev/compat.
- migrate-to-kv confidentiality (N6): refuse a remote --nats-url without --ca (the
allowlist would travel cleartext); loopback targets are exempt (isLoopbackURL).
- Docs (N1 route CA, N3 DoS): deploy/README gains a Clustering section — use a
SEPARATE cluster CA for routes (not the client CA), keep the secret out of argv,
run migrate-to-kv loopback/TLS only, and R1 is a SPOF of auth (not HA); R3
quorum is real HA. The generated cert material lives in deploy/cluster/ (0006g).
Tests:
- TestResolveClusterPass (file > env > flag precedence; missing file errors),
- TestInjectRouteCreds (injects only into userinfo-less routes; preserves overrides),
- TestIsLoopbackURL (loopback vs remote vs malformed).
CGO_ENABLED=0 go build/vet/test green; govulncheck 0 reachable.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A cluster is only as secure as its weakest node: the data plane forwards every
subject between nodes, so one node running without enforced auth lets an
unauthenticated peer Subscribe(">") on it and harvest the traffic forwarded from
the ACL'd nodes.
- validateClusterConfig now takes the auth mode and REFUSES to join a cluster
unless --bus-auth enforce, regardless of bind (a clustered node is a production
node; there is no safe dev cluster without auth). This binary therefore cannot
BE the weak node.
- Server.Posture {enforce,acl,tls,cluster,store} is published on /healthz (non
secret operational metadata, probe stays unauthenticated) so a monitor or peer
can detect a cluster member not running enforce+ACL+TLS — covering a peer that
runs a tampered/old binary outside this node's control.
Tests:
- TestAttack0008_N1: a clustered node with --bus-auth off is refused; the same
node with enforce + full route security is allowed.
- TestClusterConfigPolicy: extended with off/soft clustered cases (refused) and
the mode parameter throughout.
- TestHealthExposesPosture: /healthz returns the posture booleans + store backend.
CGO_ENABLED=0 go build/vet/test green; govulncheck 0 reachable.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The H2 guard refused "public bind without enforce" and "TLS flags without
enforce", but it still ALLOWED a public bind with enforce and no --tls-cert: the
control plane then served metadata (subjects, pubkeys, sealed keys, the social
graph) over plaintext HTTP publicly, so audit H5 reappeared as the N4 gap (TLS
was a capability, not a requirement; report 0006).
Fix: validateBootConfig now also refuses a non-loopback --bind unless both
--tls-cert and --tls-key are set. Public deployments must serve HTTPS; loopback
dev is unaffected (no TLS still allowed there).
Verification (cmd/membershipd/config_test.go):
- TestGap_PublicEnforceNoTLS: validateBootConfig("0.0.0.0", enforce, "", "")
now returns an error mentioning --tls-cert (golden public+enforce+TLS allowed;
edge loopback-without-TLS still allowed).
- TestBootConfigPolicy table updated: public+enforce+notls / +certonly / +keyonly
and lan-ip+enforce+notls are now refused; public+enforce+tls and
loopback+enforce+tls allowed.
- CGO_ENABLED=0 go build ./... && go vet ./... && go test -count=1 ./... green.
Refs: report 0006 N4, issue 0005d.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add high-availability cluster support to the embedded NATS server
(issue 0003a, first phase of decentralization).
pkg/embeddednats:
- ServerConfig gains ServerName (unique per node, required by JetStream
RAFT) and an optional *ClusterConfig (cluster name, route host/port,
peer route URLs, shared-secret Username/Password, and a mutual-TLS
*tls.Config). applyClusterOpts maps it onto server.Options.Cluster +
Routes. Nil Cluster keeps the legacy standalone server.
pkg/busauth:
- RouteTLSConfig builds the route layer's mutual-TLS config: the node
presents its CA-signed certificate AND verifies the peer's certificate
against the bus CA (RequireAndVerifyClientCert), reusing the issue-0001
CA. Routes authenticate NODES, never the client nkey authenticator.
cmd/membershipd:
- Cluster flags (--cluster-name/--server-name/--cluster-port/--routes/
--cluster-user/--cluster-pass/--route-tls-cert/-key/-ca) wire a node
into the cluster. validateClusterConfig refuses a public cluster
without a route secret and complete mutual route TLS, and rejects
partial route-TLS flags (all-or-nothing). splitRoutes parses the CSV.
Tests (DoD: golden + 2 edge + error path):
- TestClusterForwardsAcrossNodes: 2-node cluster forwards a client
subject from one node to a subscriber on the other.
- TestClusterThreeNodesForward: 3-node (HA shape) cross-node forwarding.
- TestClusterMutualTLSForwards: forwarding over mutual-TLS routes.
- TestClusterRejectsBadRouteAuth: wrong cluster password -> no route.
- TestClusterRejectsUnsignedNode: cert not signed by the bus CA -> no route.
- TestClusterConfigPolicy / TestSplitRoutes: boot-guard + CSV parsing.
Master stays green: standalone (no --cluster-name) is unchanged.
Audit H2 (Alto). The binary defaulted to --bus-auth off, the NATS nkey
authenticator only turned on under enforce, and TLS was an independent flag.
Booting --bind 0.0.0.0 --tls-cert … without --bus-auth enforce left both
planes open while looking secure.
validateBootConfig is a pure guard, called right after flag parsing, that
log.Fatals on two insecure shapes:
- a non-loopback --bind without --bus-auth enforce, and
- --tls-cert/--tls-key without --bus-auth enforce.
An insecure public startup is now impossible (the process exits), so a
fail-open data plane never comes up for an unregistered client to reach.
TestAudit_FailOpenTLSWithoutAuth plus a full policy table cover golden
(public+enforce, dev loopback) and every refused shape.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>