Corrections learned from the real 0011 deploy:
- Bring up: the "start magnus alone and verify healthz" order deadlocks — a
lone node of a 3-node cluster has no meta-group quorum and never serves
healthz until a second node joins. Document a quorum-forming start and that
a node never self-serves.
- Replication: R1 is an unusable SPOF (all six control-plane buckets on one
node) and the cold start only converges with the three cold-start fixes;
go straight to R3 once the cluster forms.
- Add a "user add --store kv" section: the live user-add path that replaces
stop-seed-restart, with its security model and idempotency/HA/no-delete
semantics.
- Topology: real IPs, ROUTE_NETWORK=public (no WireGuard mesh exists).
- Chaos test: mark the data-plane client + failover proofs as validated (0012).
Deploy machinery now emits the persisted internal identity: the unit gains
--internal-id-file ${INTERNAL_ID_FILE} and deploy-cluster.sh writes
INTERNAL_ID_FILE into each node's cluster.env, so a fresh deploy enables the
live user-add path on every node.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Parameterized, NO-VPS-touched material to bring up unibus as a 3-node cluster.
The authoring agent ran none of it on a host; every remote-changing step is
marked HUMAN and deploy-cluster.sh defaults to a dry run.
deploy/cluster/:
- nodes.env — topology (cluster name, ports, per-node rows). Public IPs known
(homer 141.94.69.66, datardos 51.91.100.142) pre-filled; magnus public IP and
all WireGuard IPs are <PLACEHOLDER> for the human; scripts refuse to run while
any remain.
- generate-cluster-certs.sh — mints a SEPARATE cluster route CA + a route cert per
node (server+clientAuth, mutual routes) and a data-plane server cert per node
signed by the reused client CA (../tls/ca.*); SAN = public + WG + hostname.
- membershipd-cluster.service — one unit, parameterized per node via
/opt/unibus/cluster.env: enforce + per-subject ACL + TLS + --store kv,
--cluster-pass-file (secret out of argv), Restart=always.
- deploy-cluster.sh — cross-build linux/amd64, generate each node's cluster.env
(routes to the other two on the WG mesh, no userinfo), rsync + install (only
with --yes); staggered start is manual.
- README.md — runbook: prerequisites, loopback bootstrap to seed the first admin
into the KV (works around the user-CLI/KV chicken-and-egg), staggered bring-up,
verify posture+quorum, scale R1->R3 in place, and the chaos test (left to 0003f
on the real VPS).
- .gitignore — out/, build/, secrets/, *.key never committed.
bash -n passes on both scripts; go build/test unchanged.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>