ce72131ddf
Corrections learned from the real 0011 deploy:
- Bring up: the "start magnus alone and verify healthz" order deadlocks — a
lone node of a 3-node cluster has no meta-group quorum and never serves
healthz until a second node joins. Document a quorum-forming start and that
a node never self-serves.
- Replication: R1 is an unusable SPOF (all six control-plane buckets on one
node) and the cold start only converges with the three cold-start fixes;
go straight to R3 once the cluster forms.
- Add a "user add --store kv" section: the live user-add path that replaces
stop-seed-restart, with its security model and idempotency/HA/no-delete
semantics.
- Topology: real IPs, ROUTE_NETWORK=public (no WireGuard mesh exists).
- Chaos test: mark the data-plane client + failover proofs as validated (0012).
Deploy machinery now emits the persisted internal identity: the unit gains
--internal-id-file ${INTERNAL_ID_FILE} and deploy-cluster.sh writes
INTERNAL_ID_FILE into each node's cluster.env, so a fresh deploy enables the
live user-add path on every node.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Running membershipd as a systemd user service
membershipd is the unibus control plane (rooms, members, sealed keys, blob
store) and, unless you point it at an external NATS with --nats-url, it also
runs the embedded NATS + JetStream data plane. Running it as a systemd user
service keeps it alive across logout/reboot and restarts it if it crashes.
The unit (unibus-membershipd.service) binds both planes to 0.0.0.0:
| Plane | Port | Reachable from |
|---|---|---|
| HTTP control | 8470 | LAN (http://<host-ip>:8470/healthz) |
| NATS data | 4250 | LAN (nats://<host-ip>:4250) |
Install (idempotent)
cd ~/fn_registry/projects/message_bus/apps/unibus
./deploy/install.sh
This builds the binary, symlinks the unit into ~/.config/systemd/user/,
reloads systemd, and enables + starts the service.
Manual steps (what install.sh does)
cd ~/fn_registry/projects/message_bus/apps/unibus
# 1. Build the pure-Go binary (no CGO).
CGO_ENABLED=0 go build -o membershipd ./cmd/membershipd
# 2. Link the unit into the systemd user directory.
mkdir -p ~/.config/systemd/user
ln -sf "$PWD/deploy/unibus-membershipd.service" ~/.config/systemd/user/unibus-membershipd.service
# 3. Reload, enable (start on login) and start now.
systemctl --user daemon-reload
systemctl --user enable --now unibus-membershipd.service
# (optional) survive logout without an active session:
# sudo loginctl enable-linger "$USER"
Operate
systemctl --user status unibus-membershipd.service # is it active?
systemctl --user restart unibus-membershipd.service # after a rebuild
systemctl --user stop unibus-membershipd.service
systemctl --user disable unibus-membershipd.service # stop starting on login
journalctl --user -u unibus-membershipd.service -f # follow logs
# Health (local and from another LAN host):
curl -fsS http://127.0.0.1:8470/healthz
curl -fsS http://<host-lan-ip>:8470/healthz
Notes
- Writable state (SQLite DB, JetStream store, blobs) lives under
local_files/relative toWorkingDirectory, which the unit sets to the app directory. - After editing the app code, rebuild (
CGO_ENABLED=0 go build -o membershipd ./cmd/membershipd) andsystemctl --user restart unibus-membershipd.service. - To run against an external NATS instead of the embedded one, append
--nats-url nats://<host>:4222toExecStartand re-rundaemon-reload+restart.
Clustering (HA) — see deploy/cluster/
The single-node service above is secure on its own. Running unibus as a
multi-node cluster has extra hardening rules (issues 0006a–0006f); the full
runbook and the generated material live in deploy/cluster/. Key points an
operator must know:
- Homogeneous posture (0006d). Every node MUST run
--bus-auth enforce(the binary refuses to join a cluster otherwise) and present mutual route TLS on a public bind./healthzpublishes each node'spostureso a monitor can flag a node that is notenforce+acl+tls. - Separate route CA (0006f). The cluster route layer authenticates nodes,
not bus users — sign the route certs with a dedicated cluster CA
(
--route-tls-ca), NOT the client data-plane CA (--tls-cert's CA). Keeping the two trust roots separate means a client cert can never be presented to the route port.deploy/cluster/generate-cluster-certs.shbuilds this CA. - Secret out of argv (0006f). Pass the route password via
--cluster-pass-fileor theUNIBUS_CLUSTER_PASSenv var, NOT--cluster-passor anats://user:pass@hostin--routes(both are visible inps/journald). When the secret comes from a file/env, list peers as bare--routes nats://<host>:6250and the binary injects the credentials. migrate-to-kvconfidentiality (0006f). The migration writes the allowlist (handles/roles/sign pubs) into KV. Run it only against a loopback nats-url, or pin TLS with--cafor a remote target — otherwise that metadata travels in cleartext. The binary refuses a remote target without--ca.- R1 is NOT HA (0006a/N3-DoS). With
--kv-replicas 1the control plane (including the nonce bucket) is a single point of failure: if the node owning the stream dies, every authenticated request fails closed (auth DoS). Real HA needs R3 (quorum 2/3): raise replicas in place withnats stream update --replicas 3once the third node has joined. Do not advertise R1 as HA.