Files
unibus/deploy/README.md
T
egutierrez b8201a82cd fix(0006f): cluster secret out of argv, migrate-to-kv TLS guard, R1/CA docs (audit 0008 lows)
Low-severity cluster hardening from audit 0008:

- Route secret out of argv (N1-low): --cluster-pass and a nats://user:pass@host in
  --routes are visible in ps/journald. New --cluster-pass-file and the
  UNIBUS_CLUSTER_PASS env var (precedence file > env > flag); the resolved secret
  guards the route layer and is injected into bare --routes entries
  (injectRouteCreds), so peers can be listed as nats://host:6250 with no secret in
  argv. The legacy --cluster-pass stays for dev/compat.
- migrate-to-kv confidentiality (N6): refuse a remote --nats-url without --ca (the
  allowlist would travel cleartext); loopback targets are exempt (isLoopbackURL).
- Docs (N1 route CA, N3 DoS): deploy/README gains a Clustering section — use a
  SEPARATE cluster CA for routes (not the client CA), keep the secret out of argv,
  run migrate-to-kv loopback/TLS only, and R1 is a SPOF of auth (not HA); R3
  quorum is real HA. The generated cert material lives in deploy/cluster/ (0006g).

Tests:
- TestResolveClusterPass (file > env > flag precedence; missing file errors),
- TestInjectRouteCreds (injects only into userinfo-less routes; preserves overrides),
- TestIsLoopbackURL (loopback vs remote vs malformed).

CGO_ENABLED=0 go build/vet/test green; govulncheck 0 reachable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 17:24:46 +02:00

99 lines
4.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Running membershipd as a systemd user service
`membershipd` is the unibus control plane (rooms, members, sealed keys, blob
store) and, unless you point it at an external NATS with `--nats-url`, it also
runs the embedded NATS + JetStream data plane. Running it as a **systemd user
service** keeps it alive across logout/reboot and restarts it if it crashes.
The unit (`unibus-membershipd.service`) binds both planes to `0.0.0.0`:
| Plane | Port | Reachable from |
|--------------|-------|----------------|
| HTTP control | 8470 | LAN (`http://<host-ip>:8470/healthz`) |
| NATS data | 4250 | LAN (`nats://<host-ip>:4250`) |
## Install (idempotent)
```bash
cd ~/fn_registry/projects/message_bus/apps/unibus
./deploy/install.sh
```
This builds the binary, symlinks the unit into `~/.config/systemd/user/`,
reloads systemd, and enables + starts the service.
## Manual steps (what install.sh does)
```bash
cd ~/fn_registry/projects/message_bus/apps/unibus
# 1. Build the pure-Go binary (no CGO).
CGO_ENABLED=0 go build -o membershipd ./cmd/membershipd
# 2. Link the unit into the systemd user directory.
mkdir -p ~/.config/systemd/user
ln -sf "$PWD/deploy/unibus-membershipd.service" ~/.config/systemd/user/unibus-membershipd.service
# 3. Reload, enable (start on login) and start now.
systemctl --user daemon-reload
systemctl --user enable --now unibus-membershipd.service
# (optional) survive logout without an active session:
# sudo loginctl enable-linger "$USER"
```
## Operate
```bash
systemctl --user status unibus-membershipd.service # is it active?
systemctl --user restart unibus-membershipd.service # after a rebuild
systemctl --user stop unibus-membershipd.service
systemctl --user disable unibus-membershipd.service # stop starting on login
journalctl --user -u unibus-membershipd.service -f # follow logs
# Health (local and from another LAN host):
curl -fsS http://127.0.0.1:8470/healthz
curl -fsS http://<host-lan-ip>:8470/healthz
```
## Notes
- Writable state (SQLite DB, JetStream store, blobs) lives under `local_files/`
relative to `WorkingDirectory`, which the unit sets to the app directory.
- After editing the app code, rebuild (`CGO_ENABLED=0 go build -o membershipd
./cmd/membershipd`) and `systemctl --user restart unibus-membershipd.service`.
- To run against an external NATS instead of the embedded one, append
`--nats-url nats://<host>:4222` to `ExecStart` and re-run `daemon-reload` +
`restart`.
## Clustering (HA) — see `deploy/cluster/`
The single-node service above is secure on its own. Running unibus as a
multi-node **cluster** has extra hardening rules (issues 0006a0006f); the full
runbook and the generated material live in `deploy/cluster/`. Key points an
operator must know:
- **Homogeneous posture (0006d).** Every node MUST run `--bus-auth enforce` (the
binary refuses to join a cluster otherwise) and present mutual route TLS on a
public bind. `/healthz` publishes each node's `posture` so a monitor can flag a
node that is not `enforce`+`acl`+`tls`.
- **Separate route CA (0006f).** The cluster route layer authenticates *nodes*,
not bus users — sign the route certs with a **dedicated cluster CA**
(`--route-tls-ca`), NOT the client data-plane CA (`--tls-cert`'s CA). Keeping
the two trust roots separate means a client cert can never be presented to the
route port. `deploy/cluster/generate-cluster-certs.sh` builds this CA.
- **Secret out of argv (0006f).** Pass the route password via
`--cluster-pass-file` or the `UNIBUS_CLUSTER_PASS` env var, NOT `--cluster-pass`
or a `nats://user:pass@host` in `--routes` (both are visible in `ps`/journald).
When the secret comes from a file/env, list peers as bare `--routes
nats://<host>:6250` and the binary injects the credentials.
- **`migrate-to-kv` confidentiality (0006f).** The migration writes the allowlist
(handles/roles/sign pubs) into KV. Run it only against a **loopback** nats-url,
or pin TLS with `--ca` for a remote target — otherwise that metadata travels in
cleartext. The binary refuses a remote target without `--ca`.
- **R1 is NOT HA (0006a/N3-DoS).** With `--kv-replicas 1` the control plane
(including the nonce bucket) is a single point of failure: if the node owning
the stream dies, every authenticated request fails closed (auth DoS). Real HA
needs **R3** (quorum 2/3): raise replicas in place with `nats stream update
--replicas 3` once the third node has joined. Do not advertise R1 as HA.