unibus/deploy/cluster/README.md

# unibus cluster — 3-node deploy runbook (issue 0006g)

This directory holds the material to bring up unibus as a **3-node cluster**
(`magnus` + `homer` + `datardos`) for real HA: with **R3** replication the control
plane (rooms/members/keys/users on JetStream KV + the anti-replay nonce bucket)
survives the loss of any one node (quorum 2/3).

> **The agent that authored this never touched a VPS.** Every step that changes a
> remote host is marked **HUMAN** and is executed by the operator. `deploy-cluster.sh`
> defaults to a dry run.

## Files

| File | What it is |
|---|---|
| `nodes.env` | Topology: cluster name, ports, and the per-node rows (name, ssh host, public IP, WG IP). **HUMAN fills the placeholders.** |
| `generate-cluster-certs.sh` | Mints a **separate cluster route CA** + a route cert per node, and a data-plane server cert per node signed by the **client CA** (`../tls/ca.*`). |
| `membershipd-cluster.service` | One systemd unit, parameterized per node by `/opt/unibus/cluster.env`. enforce + per-subject ACL + TLS + `--store kv`, `Restart=always`. |
| `deploy-cluster.sh` | Cross-builds the linux binary, generates each node's `cluster.env`, and (with `--yes`) rsyncs everything + installs the unit. Staggered start is manual. |

Generated keys/secrets (`out/`, `build/`, `secrets/`) are **gitignored** — they are
secret and never leave the operator's trusted machine except over the secure
rsync channel.

## Topology

| Node | SSH | Public IP | WireGuard IP | Role |
|---|---|---|---|---|
| magnus | `magnus` | `<MAGNUS_PUBLIC_IP>` | `<MAGNUS_WG_IP>` | seed (first up) |
| homer | `homer` | `141.94.69.66` | `<HOMER_WG_IP>` | replica |
| datardos | `dd` | `51.91.100.142` | `<DATARDOS_WG_IP>` (10.21.0.x) | replica |

The route layer (server-to-server) prefers the **WireGuard mesh**
(`ROUTE_NETWORK=wg`); the client data plane and the HTTP control plane are reached
over the public IPs. The route CA is **separate** from the client CA, so a client
cert can never be presented to the route port.

## Prerequisites (HUMAN, once)

1. **Fill `nodes.env`** — replace every `<PLACEHOLDER>` (magnus public IP, all WG
   IPs). The scripts refuse to run while any remain.
2. **Client CA exists** — `../tls/ca.crt` + `../tls/ca.key`. If not, run
   `../tls/generate-certs.sh` on the CA host (om) first. The cluster reuses this CA
   for the data plane so existing clients keep trusting the bus.
3. **Mint cluster TLS**:
   ```bash
   ./generate-cluster-certs.sh        # writes out/<name>/ ; --force to rotate the cluster CA
   ```
4. **Create the route secret** (out of argv, shared by all nodes):
   ```bash
   mkdir -p secrets && openssl rand -hex 32 > secrets/cluster.pass
   ```
5. **SSH** to each node's SSH host as `root` works (`ssh magnus true`, `ssh dd true`, ...).

## Stage the nodes

```bash
./deploy-cluster.sh            # DRY RUN — prints the full plan, touches nothing
./deploy-cluster.sh --yes      # HUMAN: actually rsync + install the unit on all 3 nodes
```

This cross-builds `membershipd` (linux/amd64, `CGO_ENABLED=0`), writes each node's
`cluster.env` (its `NODE_NAME` and the `--routes` to the OTHER two nodes), and
ships the binary, the node's TLS material, the secret, the env file and the unit.
It does **not** start anything.

## Seed the first admin into the KV (HUMAN — loopback bootstrap)

The empty KV control plane has no users, and under `enforce` no external tool can
write the FIRST admin over NATS (it would need to be an admin already — a
chicken-and-egg). The `user` CLI also writes only to a local SQLite file, not the
KV. So the first admin is seeded on the seed node through a **loopback, no-auth
bootstrap** that populates the same JetStream store the cluster unit then reuses:

```bash
ssh root@magnus 'bash -s' <<'SEED'
set -euo pipefail
cd /opt/unibus
# a) Put the first admin into a local SQLite seed file.
./membershipd user add --db ./seed.db --handle root --sign-pub <ADMIN_SIGN_PUB_HEX> --role admin
# b) Bring up a TEMPORARY loopback, no-auth, single-node KV server on the cluster's
#    own JetStream store dir (not exposed; bus-auth off is allowed on 127.0.0.1).
./membershipd --store kv --bus-auth off --bind 127.0.0.1 \
  --nats-store ./local_files/jetstream --db ./seed.db >/tmp/seed-boot.log 2>&1 &
BOOT=$!; sleep 2
# c) Migrate the admin from SQLite into the replicated KV (loopback — no --ca needed).
./membershipd migrate-to-kv --db ./seed.db --nats-url nats://127.0.0.1:4250 --replicas 1
# d) Stop the bootstrap server. The KV buckets persist in ./local_files/jetstream.
kill "$BOOT"; wait "$BOOT" 2>/dev/null || true
rm -f ./seed.db
SEED
```

> The KV written here lives in `./local_files/jetstream`, which the cluster unit
> reuses (`--nats-store` default), so the admin is present when the enforce cluster
> starts. Additional users are added the same loopback way until a
> `user add --store kv` exists (see GAP in report 0009).

## Bring up (HUMAN — staggered)

Bring up the seed first, then the replicas one at a time, checking each joins.

```bash
# 1. Seed node (after the seed step above).
ssh root@magnus 'systemctl enable --now membershipd-cluster'
ssh root@magnus 'curl -fsS https://127.0.0.1:8470/healthz --cacert /opt/unibus/tls/ca.crt'

# 2. Replicas, one at a time.
ssh root@homer    'systemctl enable --now membershipd-cluster'
ssh root@datardos 'systemctl enable --now membershipd-cluster'
```

> Initial rollout runs at **R1** (`KV_REPLICAS=1` in `nodes.env`): the buckets live
> on the seed only. This is NOT HA yet — see "Scale to R3".

## Promote an existing single-node (SQLite) deployment (HUMAN, optional)

Instead of seeding fresh, you can migrate an existing single-node `unibus.db` into
the KV — **loopback only** (the allowlist would otherwise travel cleartext; the
command refuses a remote target without `--ca`). Use the same loopback-bootstrap
shape as the seed step (temporary `--bus-auth off` server on 127.0.0.1, then
`migrate-to-kv --db /opt/unibus/local_files/unibus.db`).

## Verify

```bash
# Posture on every node — all must be enforce+acl+tls+cluster, store=kv.
for h in magnus homer datardos; do
  echo "== $h =="
  ssh root@$h 'curl -fsS https://127.0.0.1:8470/healthz --cacert /opt/unibus/tls/ca.crt'
done

# Cluster + JetStream meta-group health (needs the `nats` CLI on a node):
ssh root@magnus 'nats --server nats://127.0.0.1:4250 server report jetstream'
ssh root@magnus 'nats --server nats://127.0.0.1:4250 server list'   # 3 servers, routes up
```

A healthy cluster shows 3 routed servers and a JetStream meta-group with a leader.

## Scale to R3 (HUMAN — real HA)

Once all three nodes are up and routed, raise the replication factor of every
control-plane stream from 1 to 3 IN PLACE (no data loss), then flip `KV_REPLICAS=3`
in `nodes.env` so future (re)deploys keep it:

```bash
for s in KV_UNIBUS_users KV_UNIBUS_rooms KV_UNIBUS_members KV_UNIBUS_room_keys \
         KV_UNIBUS_rooms_by_member KV_UNIBUS_nonces; do
  ssh root@magnus "nats --server nats://127.0.0.1:4250 stream update $s --replicas 3 -f"
done
# (also OBJ_UNIBUS_blobs if the object store is in use)
```

Until this is done, R1 means the seed node is a **single point of failure for
authentication**: if it dies, the nonce/KV control plane is unreachable and every
authenticated request fails closed (auth DoS). R1 is a rollout step, not HA.

## Chaos test (HUMAN — requires the 3 live VPS; NOT run here)

Validate quorum tolerance after R3:

```bash
# Kill one node; the cluster keeps serving (quorum 2/3).
ssh root@datardos 'systemctl stop membershipd-cluster'
#   -> clients fail over (multiple seed URLs); reads/writes still succeed.
ssh root@datardos 'systemctl start membershipd-cluster'   # rejoins, catches up

# Kill two nodes; quorum is LOST — the control plane should fail CLOSED (deny),
# never fail open. Verify a request is rejected, not silently served.
```

This network-level chaos test (kill 1/3, kill 2/3, partition/split-brain) is part
of the deploy validation (issue 0003f) and runs against the real VPS — it is
deliberately out of scope for the authoring agent.

## Rollback

`membershipd` does not delete data. To revert a node to standalone SQLite, stop
the unit and start it without `--store kv`/`--cluster-name`; the KV buckets remain
for a later retry. To rotate the cluster CA, re-run `generate-cluster-certs.sh
--force` and re-stage (every node must get the new `cluster-ca.crt` together).