ce72131ddf
Corrections learned from the real 0011 deploy:
- Bring up: the "start magnus alone and verify healthz" order deadlocks — a
lone node of a 3-node cluster has no meta-group quorum and never serves
healthz until a second node joins. Document a quorum-forming start and that
a node never self-serves.
- Replication: R1 is an unusable SPOF (all six control-plane buckets on one
node) and the cold start only converges with the three cold-start fixes;
go straight to R3 once the cluster forms.
- Add a "user add --store kv" section: the live user-add path that replaces
stop-seed-restart, with its security model and idempotency/HA/no-delete
semantics.
- Topology: real IPs, ROUTE_NETWORK=public (no WireGuard mesh exists).
- Chaos test: mark the data-plane client + failover proofs as validated (0012).
Deploy machinery now emits the persisted internal identity: the unit gains
--internal-id-file ${INTERNAL_ID_FILE} and deploy-cluster.sh writes
INTERNAL_ID_FILE into each node's cluster.env, so a fresh deploy enables the
live user-add path on every node.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
47 lines
1.6 KiB
Desktop File
47 lines
1.6 KiB
Desktop File
[Unit]
|
|
# unibus membershipd — cluster node (issue 0006g).
|
|
#
|
|
# One unit, parameterized per node by /opt/unibus/cluster.env (generated by
|
|
# deploy-cluster.sh): NODE_NAME, ROUTES and the cert paths differ per node, the
|
|
# rest of the posture (enforce + per-subject ACL + TLS + --store kv) is identical
|
|
# on every node, which is the homogeneous posture a secure cluster requires
|
|
# (audit 0008 N1).
|
|
Description=unibus membershipd (cluster node)
|
|
After=network-online.target
|
|
Wants=network-online.target
|
|
|
|
[Service]
|
|
Type=simple
|
|
WorkingDirectory=/opt/unibus
|
|
EnvironmentFile=/opt/unibus/cluster.env
|
|
# The route password comes from a FILE referenced by ${CLUSTER_PASS_FILE}, never
|
|
# from argv (audit 0008 N1-low). The peer --routes carry no userinfo; membershipd
|
|
# injects the credentials from the file/user.
|
|
ExecStart=/opt/unibus/membershipd \
|
|
--bind 0.0.0.0 \
|
|
--bus-auth enforce \
|
|
--http-port ${HTTP_PORT} \
|
|
--nats-port ${NATS_CLIENT_PORT} \
|
|
--tls-cert ${TLS_CERT} \
|
|
--tls-key ${TLS_KEY} \
|
|
--cluster-name ${CLUSTER_NAME} \
|
|
--server-name ${NODE_NAME} \
|
|
--cluster-port ${NATS_ROUTE_PORT} \
|
|
--routes ${ROUTES} \
|
|
--cluster-user ${CLUSTER_USER} \
|
|
--cluster-pass-file ${CLUSTER_PASS_FILE} \
|
|
--route-tls-cert ${ROUTE_TLS_CERT} \
|
|
--route-tls-key ${ROUTE_TLS_KEY} \
|
|
--route-tls-ca ${ROUTE_TLS_CA} \
|
|
--internal-id-file ${INTERNAL_ID_FILE} \
|
|
--store kv \
|
|
--kv-replicas ${KV_REPLICAS}
|
|
# Restart=always (NOT on-failure): a clean SIGTERM exits success, and on-failure
|
|
# would then NOT restart, leaving the node silently dead (see function_tags.md).
|
|
Restart=always
|
|
RestartSec=2
|
|
LimitNOFILE=65536
|
|
|
|
[Install]
|
|
WantedBy=multi-user.target
|