Commit Graph

4 Commits

Author SHA1 Message Date
egutierrez 9b96537aa6 fix(0006d): enforce homogeneous cluster posture + publish posture on /healthz (audit 0008 N1)
A cluster is only as secure as its weakest node: the data plane forwards every
subject between nodes, so one node running without enforced auth lets an
unauthenticated peer Subscribe(">") on it and harvest the traffic forwarded from
the ACL'd nodes.

- validateClusterConfig now takes the auth mode and REFUSES to join a cluster
  unless --bus-auth enforce, regardless of bind (a clustered node is a production
  node; there is no safe dev cluster without auth). This binary therefore cannot
  BE the weak node.
- Server.Posture {enforce,acl,tls,cluster,store} is published on /healthz (non
  secret operational metadata, probe stays unauthenticated) so a monitor or peer
  can detect a cluster member not running enforce+ACL+TLS — covering a peer that
  runs a tampered/old binary outside this node's control.

Tests:
- TestAttack0008_N1: a clustered node with --bus-auth off is refused; the same
  node with enforce + full route security is allowed.
- TestClusterConfigPolicy: extended with off/soft clustered cases (refused) and
  the mode parameter throughout.
- TestHealthExposesPosture: /healthz returns the posture booleans + store backend.

CGO_ENABLED=0 go build/vet/test green; govulncheck 0 reachable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 17:17:37 +02:00
egutierrez d01da9d396 fix(0005d): require TLS on a public bind (close N4 plaintext control plane)
The H2 guard refused "public bind without enforce" and "TLS flags without
enforce", but it still ALLOWED a public bind with enforce and no --tls-cert: the
control plane then served metadata (subjects, pubkeys, sealed keys, the social
graph) over plaintext HTTP publicly, so audit H5 reappeared as the N4 gap (TLS
was a capability, not a requirement; report 0006).

Fix: validateBootConfig now also refuses a non-loopback --bind unless both
--tls-cert and --tls-key are set. Public deployments must serve HTTPS; loopback
dev is unaffected (no TLS still allowed there).

Verification (cmd/membershipd/config_test.go):
- TestGap_PublicEnforceNoTLS: validateBootConfig("0.0.0.0", enforce, "", "")
  now returns an error mentioning --tls-cert (golden public+enforce+TLS allowed;
  edge loopback-without-TLS still allowed).
- TestBootConfigPolicy table updated: public+enforce+notls / +certonly / +keyonly
  and lan-ip+enforce+notls are now refused; public+enforce+tls and
  loopback+enforce+tls allowed.
- CGO_ENABLED=0 go build ./... && go vet ./... && go test -count=1 ./...  green.

Refs: report 0006 N4, issue 0005d.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 16:11:45 +02:00
agent c90f145a05 feat(0003a): NATS cluster routes with shared-secret auth + mutual route TLS
Add high-availability cluster support to the embedded NATS server
(issue 0003a, first phase of decentralization).

pkg/embeddednats:
- ServerConfig gains ServerName (unique per node, required by JetStream
  RAFT) and an optional *ClusterConfig (cluster name, route host/port,
  peer route URLs, shared-secret Username/Password, and a mutual-TLS
  *tls.Config). applyClusterOpts maps it onto server.Options.Cluster +
  Routes. Nil Cluster keeps the legacy standalone server.

pkg/busauth:
- RouteTLSConfig builds the route layer's mutual-TLS config: the node
  presents its CA-signed certificate AND verifies the peer's certificate
  against the bus CA (RequireAndVerifyClientCert), reusing the issue-0001
  CA. Routes authenticate NODES, never the client nkey authenticator.

cmd/membershipd:
- Cluster flags (--cluster-name/--server-name/--cluster-port/--routes/
  --cluster-user/--cluster-pass/--route-tls-cert/-key/-ca) wire a node
  into the cluster. validateClusterConfig refuses a public cluster
  without a route secret and complete mutual route TLS, and rejects
  partial route-TLS flags (all-or-nothing). splitRoutes parses the CSV.

Tests (DoD: golden + 2 edge + error path):
- TestClusterForwardsAcrossNodes: 2-node cluster forwards a client
  subject from one node to a subscriber on the other.
- TestClusterThreeNodesForward: 3-node (HA shape) cross-node forwarding.
- TestClusterMutualTLSForwards: forwarding over mutual-TLS routes.
- TestClusterRejectsBadRouteAuth: wrong cluster password -> no route.
- TestClusterRejectsUnsignedNode: cert not signed by the bus CA -> no route.
- TestClusterConfigPolicy / TestSplitRoutes: boot-guard + CSV parsing.

Master stays green: standalone (no --cluster-name) is unchanged.
2026-06-07 14:54:53 +02:00
egutierrez 30577145ce feat(membershipd): refuse fail-open startup configs
Audit H2 (Alto). The binary defaulted to --bus-auth off, the NATS nkey
authenticator only turned on under enforce, and TLS was an independent flag.
Booting --bind 0.0.0.0 --tls-cert … without --bus-auth enforce left both
planes open while looking secure.

validateBootConfig is a pure guard, called right after flag parsing, that
log.Fatals on two insecure shapes:
  - a non-loopback --bind without --bus-auth enforce, and
  - --tls-cert/--tls-key without --bus-auth enforce.

An insecure public startup is now impossible (the process exits), so a
fail-open data plane never comes up for an unregistered client to reach.
TestAudit_FailOpenTLSWithoutAuth plus a full policy table cover golden
(public+enforce, dev loopback) and every refused shape.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 14:17:37 +02:00