Files
unibus/deploy
Egutierrez 1c9325104c feat(embeddednats): UNIBUS_NATS_MONITOR flag decoupled from debug log
Add a dedicated UNIBUS_NATS_MONITOR=1 toggle that opens the embedded
nats-server monitoring HTTP endpoint (127.0.0.1:8222, loopback only) so a
local metrics scraper can read /varz, /connz and /jsz for server-level
metrics (msgs/s, connections, KV bucket msgs, RAFT leader per stream,
restarts).

Previously the monitoring endpoint was only reachable via UNIBUS_NATS_DEBUG=1,
which is coupled to the verbose nats-server debug log: enabling the endpoint
also wrote routes/RAFT/room subjects to journald in clear, which regresses the
hardened posture (issue 0007). The two concerns are now decoupled.

The toggle computation is extracted to a pure function
natsLogOpts(debugEnv, monitorEnv) (noLog, debug, trace, monitor): MONITOR=1
opens the endpoint while keeping the log quiet (NoLog true / Debug false). The
inverse coupling is preserved for backward compatibility (DEBUG still implies
MONITOR). The 127.0.0.1 bind stays hardcoded — the monitoring endpoint has no
auth and must never be reachable from the network.

Deploy wiring versioned: additive systemd drop-in
membershipd-cluster.service.d/nats-monitor.conf (Environment=UNIBUS_NATS_MONITOR=1)
plus a "NATS server metrics" section in the cluster README with the rolling
activation runbook (magnus -> homer -> datardos) gated on R3 reconvergence
(followers 2/2) between nodes.

Tests: pure decoupling table (monitor on => log NOT debug; debug => monitor;
default closed) + a real embedded server with MONITOR=1 asserting /varz answers
200 on loopback:8222, and a server without the flag with the endpoint closed.
100% additive: behavior is identical without the flag. Bump app.md 0.10.0 ->
0.11.0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 20:57:46 +02:00
..

Running membershipd as a systemd user service

membershipd is the unibus control plane (rooms, members, sealed keys, blob store) and, unless you point it at an external NATS with --nats-url, it also runs the embedded NATS + JetStream data plane. Running it as a systemd user service keeps it alive across logout/reboot and restarts it if it crashes.

The unit (unibus-membershipd.service) binds both planes to 0.0.0.0:

Plane Port Reachable from
HTTP control 8470 LAN (http://<host-ip>:8470/healthz)
NATS data 4250 LAN (nats://<host-ip>:4250)

Install (idempotent)

cd ~/fn_registry/projects/message_bus/apps/unibus
./deploy/install.sh

This builds the binary, symlinks the unit into ~/.config/systemd/user/, reloads systemd, and enables + starts the service.

Manual steps (what install.sh does)

cd ~/fn_registry/projects/message_bus/apps/unibus

# 1. Build the pure-Go binary (no CGO).
CGO_ENABLED=0 go build -o membershipd ./cmd/membershipd

# 2. Link the unit into the systemd user directory.
mkdir -p ~/.config/systemd/user
ln -sf "$PWD/deploy/unibus-membershipd.service" ~/.config/systemd/user/unibus-membershipd.service

# 3. Reload, enable (start on login) and start now.
systemctl --user daemon-reload
systemctl --user enable --now unibus-membershipd.service

# (optional) survive logout without an active session:
#   sudo loginctl enable-linger "$USER"

Operate

systemctl --user status  unibus-membershipd.service     # is it active?
systemctl --user restart unibus-membershipd.service     # after a rebuild
systemctl --user stop    unibus-membershipd.service
systemctl --user disable unibus-membershipd.service     # stop starting on login
journalctl --user -u unibus-membershipd.service -f      # follow logs

# Health (local and from another LAN host):
curl -fsS http://127.0.0.1:8470/healthz
curl -fsS http://<host-lan-ip>:8470/healthz

Notes

  • Writable state (SQLite DB, JetStream store, blobs) lives under local_files/ relative to WorkingDirectory, which the unit sets to the app directory.
  • After editing the app code, rebuild (CGO_ENABLED=0 go build -o membershipd ./cmd/membershipd) and systemctl --user restart unibus-membershipd.service.
  • To run against an external NATS instead of the embedded one, append --nats-url nats://<host>:4222 to ExecStart and re-run daemon-reload + restart.

Clustering (HA) — see deploy/cluster/

The single-node service above is secure on its own. Running unibus as a multi-node cluster has extra hardening rules (issues 0006a0006f); the full runbook and the generated material live in deploy/cluster/. Key points an operator must know:

  • Homogeneous posture (0006d). Every node MUST run --bus-auth enforce (the binary refuses to join a cluster otherwise) and present mutual route TLS on a public bind. /healthz publishes each node's posture so a monitor can flag a node that is not enforce+acl+tls.
  • Separate route CA (0006f). The cluster route layer authenticates nodes, not bus users — sign the route certs with a dedicated cluster CA (--route-tls-ca), NOT the client data-plane CA (--tls-cert's CA). Keeping the two trust roots separate means a client cert can never be presented to the route port. deploy/cluster/generate-cluster-certs.sh builds this CA.
  • Secret out of argv (0006f). Pass the route password via --cluster-pass-file or the UNIBUS_CLUSTER_PASS env var, NOT --cluster-pass or a nats://user:pass@host in --routes (both are visible in ps/journald). When the secret comes from a file/env, list peers as bare --routes nats://<host>:6250 and the binary injects the credentials.
  • migrate-to-kv confidentiality (0006f). The migration writes the allowlist (handles/roles/sign pubs) into KV. Run it only against a loopback nats-url, or pin TLS with --ca for a remote target — otherwise that metadata travels in cleartext. The binary refuses a remote target without --ca.
  • R1 is NOT HA (0006a/N3-DoS). With --kv-replicas 1 the control plane (including the nonce bucket) is a single point of failure: if the node owning the stream dies, every authenticated request fails closed (auth DoS). Real HA needs R3 (quorum 2/3): raise replicas in place with nats stream update --replicas 3 once the third node has joined. Do not advertise R1 as HA.