Files

T

egutierrez ce72131ddf docs(cluster): correct runbook + wire --internal-id-file into deploy

Corrections learned from the real 0011 deploy:
- Bring up: the "start magnus alone and verify healthz" order deadlocks — a
  lone node of a 3-node cluster has no meta-group quorum and never serves
  healthz until a second node joins. Document a quorum-forming start and that
  a node never self-serves.
- Replication: R1 is an unusable SPOF (all six control-plane buckets on one
  node) and the cold start only converges with the three cold-start fixes;
  go straight to R3 once the cluster forms.
- Add a "user add --store kv" section: the live user-add path that replaces
  stop-seed-restart, with its security model and idempotency/HA/no-delete
  semantics.
- Topology: real IPs, ROUTE_NETWORK=public (no WireGuard mesh exists).
- Chaos test: mark the data-plane client + failover proofs as validated (0012).

Deploy machinery now emits the persisted internal identity: the unit gains
--internal-id-file ${INTERNAL_ID_FILE} and deploy-cluster.sh writes
INTERNAL_ID_FILE into each node's cluster.env, so a fresh deploy enables the
live user-add path on every node.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-07 19:41:56 +02:00

cluster

docs(cluster): correct runbook + wire --internal-id-file into deploy

2026-06-07 19:41:56 +02:00

tls

feat(deploy/tls): self-signed CA + server cert generator

2026-06-07 12:44:13 +02:00

install.sh

feat(deploy): systemd user unit + install script for membershipd

2026-06-06 18:05:53 +02:00

README.md

fix(0006f): cluster secret out of argv, migrate-to-kv TLS guard, R1/CA docs (audit 0008 lows)

2026-06-07 17:24:46 +02:00

unibus-membershipd.service

feat(deploy): systemd user unit + install script for membershipd

2026-06-06 18:05:53 +02:00

README.md

Running membershipd as a systemd user service

membershipd is the unibus control plane (rooms, members, sealed keys, blob store) and, unless you point it at an external NATS with --nats-url, it also runs the embedded NATS + JetStream data plane. Running it as a systemd user service keeps it alive across logout/reboot and restarts it if it crashes.

The unit (unibus-membershipd.service) binds both planes to 0.0.0.0:

Plane	Port	Reachable from
HTTP control	8470	LAN (`http://<host-ip>:8470/healthz`)
NATS data	4250	LAN (`nats://<host-ip>:4250`)

Install (idempotent)

cd ~/fn_registry/projects/message_bus/apps/unibus
./deploy/install.sh

This builds the binary, symlinks the unit into ~/.config/systemd/user/, reloads systemd, and enables + starts the service.

Manual steps (what install.sh does)

cd ~/fn_registry/projects/message_bus/apps/unibus

# 1. Build the pure-Go binary (no CGO).
CGO_ENABLED=0 go build -o membershipd ./cmd/membershipd

# 2. Link the unit into the systemd user directory.
mkdir -p ~/.config/systemd/user
ln -sf "$PWD/deploy/unibus-membershipd.service" ~/.config/systemd/user/unibus-membershipd.service

# 3. Reload, enable (start on login) and start now.
systemctl --user daemon-reload
systemctl --user enable --now unibus-membershipd.service

# (optional) survive logout without an active session:
#   sudo loginctl enable-linger "$USER"

Operate

systemctl --user status  unibus-membershipd.service     # is it active?
systemctl --user restart unibus-membershipd.service     # after a rebuild
systemctl --user stop    unibus-membershipd.service
systemctl --user disable unibus-membershipd.service     # stop starting on login
journalctl --user -u unibus-membershipd.service -f      # follow logs

# Health (local and from another LAN host):
curl -fsS http://127.0.0.1:8470/healthz
curl -fsS http://<host-lan-ip>:8470/healthz

Notes

Writable state (SQLite DB, JetStream store, blobs) lives under local_files/ relative to WorkingDirectory, which the unit sets to the app directory.
After editing the app code, rebuild (CGO_ENABLED=0 go build -o membershipd ./cmd/membershipd) and systemctl --user restart unibus-membershipd.service.
To run against an external NATS instead of the embedded one, append --nats-url nats://<host>:4222 to ExecStart and re-run daemon-reload + restart.

Clustering (HA) — see `deploy/cluster/`

The single-node service above is secure on its own. Running unibus as a multi-node cluster has extra hardening rules (issues 0006a–0006f); the full runbook and the generated material live in deploy/cluster/. Key points an operator must know:

Homogeneous posture (0006d). Every node MUST run --bus-auth enforce (the binary refuses to join a cluster otherwise) and present mutual route TLS on a public bind. /healthz publishes each node's posture so a monitor can flag a node that is not enforce+acl+tls.
Separate route CA (0006f). The cluster route layer authenticates nodes, not bus users — sign the route certs with a dedicated cluster CA (--route-tls-ca), NOT the client data-plane CA (--tls-cert's CA). Keeping the two trust roots separate means a client cert can never be presented to the route port. deploy/cluster/generate-cluster-certs.sh builds this CA.
Secret out of argv (0006f). Pass the route password via --cluster-pass-file or the UNIBUS_CLUSTER_PASS env var, NOT --cluster-pass or a nats://user:pass@host in --routes (both are visible in ps/journald). When the secret comes from a file/env, list peers as bare --routes nats://<host>:6250 and the binary injects the credentials.
migrate-to-kv confidentiality (0006f). The migration writes the allowlist (handles/roles/sign pubs) into KV. Run it only against a loopback nats-url, or pin TLS with --ca for a remote target — otherwise that metadata travels in cleartext. The binary refuses a remote target without --ca.
R1 is NOT HA (0006a/N3-DoS). With --kv-replicas 1 the control plane (including the nonce bucket) is a single point of failure: if the node owning the stream dies, every authenticated request fails closed (auth DoS). Real HA needs R3 (quorum 2/3): raise replicas in place with nats stream update --replicas 3 once the third node has joined. Do not advertise R1 as HA.

README.md Unescape Escape

Running membershipd as a systemd user service

Install (idempotent)

Manual steps (what install.sh does)

Operate

Notes

Clustering (HA) — see deploy/cluster/

README.md

Clustering (HA) — see `deploy/cluster/`