feat(0003a): NATS cluster routes with shared-secret auth + mutual route TLS

Add high-availability cluster support to the embedded NATS server
(issue 0003a, first phase of decentralization).

pkg/embeddednats:
- ServerConfig gains ServerName (unique per node, required by JetStream
  RAFT) and an optional *ClusterConfig (cluster name, route host/port,
  peer route URLs, shared-secret Username/Password, and a mutual-TLS
  *tls.Config). applyClusterOpts maps it onto server.Options.Cluster +
  Routes. Nil Cluster keeps the legacy standalone server.

pkg/busauth:
- RouteTLSConfig builds the route layer's mutual-TLS config: the node
  presents its CA-signed certificate AND verifies the peer's certificate
  against the bus CA (RequireAndVerifyClientCert), reusing the issue-0001
  CA. Routes authenticate NODES, never the client nkey authenticator.

cmd/membershipd:
- Cluster flags (--cluster-name/--server-name/--cluster-port/--routes/
  --cluster-user/--cluster-pass/--route-tls-cert/-key/-ca) wire a node
  into the cluster. validateClusterConfig refuses a public cluster
  without a route secret and complete mutual route TLS, and rejects
  partial route-TLS flags (all-or-nothing). splitRoutes parses the CSV.

Tests (DoD: golden + 2 edge + error path):
- TestClusterForwardsAcrossNodes: 2-node cluster forwards a client
  subject from one node to a subscriber on the other.
- TestClusterThreeNodesForward: 3-node (HA shape) cross-node forwarding.
- TestClusterMutualTLSForwards: forwarding over mutual-TLS routes.
- TestClusterRejectsBadRouteAuth: wrong cluster password -> no route.
- TestClusterRejectsUnsignedNode: cert not signed by the bus CA -> no route.
- TestClusterConfigPolicy / TestSplitRoutes: boot-guard + CSV parsing.

Master stays green: standalone (no --cluster-name) is unchanged.
This commit is contained in:
agent
2026-06-07 14:54:53 +02:00
parent 618f6b61da
commit c90f145a05
6 changed files with 623 additions and 5 deletions
+38
View File
@@ -35,3 +35,41 @@ func ServerTLSConfig(certPEMPath, keyPEMPath string) (*tls.Config, error) {
}
return &tls.Config{Certificates: []tls.Certificate{cert}, MinVersion: tls.VersionTLS12}, nil
}
// RouteTLSConfig builds the mutual-TLS config for the NATS CLUSTER route layer
// (issue 0003a). Unlike the client data plane, where the server presents a cert
// and only the client verifies it, routes are server-to-server: each node both
// presents its own node certificate AND verifies the connecting node's
// certificate against the bus CA. So this single config carries:
//
// - Certificates: this node's CA-signed certificate (presented in both the
// server and the client role of a route handshake),
// - RootCAs: the bus CA, to verify the certificate of a node we dial out to,
// - ClientCAs + ClientAuth=RequireAndVerifyClientCert: the bus CA, to verify
// the certificate of a node dialing in.
//
// The effect: a node that lacks a certificate signed by the bus CA cannot
// establish a route in either direction, even if it knows the cluster password.
// Reuse the same CA as the client data plane (deploy/tls) but a per-node cert
// whose SAN covers that node's route address.
func RouteTLSConfig(certPEMPath, keyPEMPath, caPEMPath string) (*tls.Config, error) {
cert, err := tls.LoadX509KeyPair(certPEMPath, keyPEMPath)
if err != nil {
return nil, fmt.Errorf("busauth: load route keypair: %w", err)
}
pem, err := os.ReadFile(caPEMPath)
if err != nil {
return nil, fmt.Errorf("busauth: read route CA %q: %w", caPEMPath, err)
}
pool := x509.NewCertPool()
if !pool.AppendCertsFromPEM(pem) {
return nil, fmt.Errorf("busauth: route CA %q contains no valid PEM certificate", caPEMPath)
}
return &tls.Config{
Certificates: []tls.Certificate{cert},
RootCAs: pool,
ClientCAs: pool,
ClientAuth: tls.RequireAndVerifyClientCert,
MinVersion: tls.VersionTLS12,
}, nil
}