diff --git a/deploy/cluster/.gitignore b/deploy/cluster/.gitignore new file mode 100644 index 0000000..7ae8dfb --- /dev/null +++ b/deploy/cluster/.gitignore @@ -0,0 +1,7 @@ +# Generated TLS material and secrets — NEVER commit (audit 0008: keys/secret). +out/ +build/ +secrets/ +*.key +*.srl +cluster-ca.crt diff --git a/deploy/cluster/README.md b/deploy/cluster/README.md new file mode 100644 index 0000000..5caf865 --- /dev/null +++ b/deploy/cluster/README.md @@ -0,0 +1,181 @@ +# unibus cluster — 3-node deploy runbook (issue 0006g) + +This directory holds the material to bring up unibus as a **3-node cluster** +(`magnus` + `homer` + `datardos`) for real HA: with **R3** replication the control +plane (rooms/members/keys/users on JetStream KV + the anti-replay nonce bucket) +survives the loss of any one node (quorum 2/3). + +> **The agent that authored this never touched a VPS.** Every step that changes a +> remote host is marked **HUMAN** and is executed by the operator. `deploy-cluster.sh` +> defaults to a dry run. + +## Files + +| File | What it is | +|---|---| +| `nodes.env` | Topology: cluster name, ports, and the per-node rows (name, ssh host, public IP, WG IP). **HUMAN fills the placeholders.** | +| `generate-cluster-certs.sh` | Mints a **separate cluster route CA** + a route cert per node, and a data-plane server cert per node signed by the **client CA** (`../tls/ca.*`). | +| `membershipd-cluster.service` | One systemd unit, parameterized per node by `/opt/unibus/cluster.env`. enforce + per-subject ACL + TLS + `--store kv`, `Restart=always`. | +| `deploy-cluster.sh` | Cross-builds the linux binary, generates each node's `cluster.env`, and (with `--yes`) rsyncs everything + installs the unit. Staggered start is manual. | + +Generated keys/secrets (`out/`, `build/`, `secrets/`) are **gitignored** — they are +secret and never leave the operator's trusted machine except over the secure +rsync channel. + +## Topology + +| Node | SSH | Public IP | WireGuard IP | Role | +|---|---|---|---|---| +| magnus | `magnus` | `` | `` | seed (first up) | +| homer | `homer` | `141.94.69.66` | `` | replica | +| datardos | `dd` | `51.91.100.142` | `` (10.21.0.x) | replica | + +The route layer (server-to-server) prefers the **WireGuard mesh** +(`ROUTE_NETWORK=wg`); the client data plane and the HTTP control plane are reached +over the public IPs. The route CA is **separate** from the client CA, so a client +cert can never be presented to the route port. + +## Prerequisites (HUMAN, once) + +1. **Fill `nodes.env`** — replace every `` (magnus public IP, all WG + IPs). The scripts refuse to run while any remain. +2. **Client CA exists** — `../tls/ca.crt` + `../tls/ca.key`. If not, run + `../tls/generate-certs.sh` on the CA host (om) first. The cluster reuses this CA + for the data plane so existing clients keep trusting the bus. +3. **Mint cluster TLS**: + ```bash + ./generate-cluster-certs.sh # writes out// ; --force to rotate the cluster CA + ``` +4. **Create the route secret** (out of argv, shared by all nodes): + ```bash + mkdir -p secrets && openssl rand -hex 32 > secrets/cluster.pass + ``` +5. **SSH** to each node's SSH host as `root` works (`ssh magnus true`, `ssh dd true`, ...). + +## Stage the nodes + +```bash +./deploy-cluster.sh # DRY RUN — prints the full plan, touches nothing +./deploy-cluster.sh --yes # HUMAN: actually rsync + install the unit on all 3 nodes +``` + +This cross-builds `membershipd` (linux/amd64, `CGO_ENABLED=0`), writes each node's +`cluster.env` (its `NODE_NAME` and the `--routes` to the OTHER two nodes), and +ships the binary, the node's TLS material, the secret, the env file and the unit. +It does **not** start anything. + +## Seed the first admin into the KV (HUMAN — loopback bootstrap) + +The empty KV control plane has no users, and under `enforce` no external tool can +write the FIRST admin over NATS (it would need to be an admin already — a +chicken-and-egg). The `user` CLI also writes only to a local SQLite file, not the +KV. So the first admin is seeded on the seed node through a **loopback, no-auth +bootstrap** that populates the same JetStream store the cluster unit then reuses: + +```bash +ssh root@magnus 'bash -s' <<'SEED' +set -euo pipefail +cd /opt/unibus +# a) Put the first admin into a local SQLite seed file. +./membershipd user add --db ./seed.db --handle root --sign-pub --role admin +# b) Bring up a TEMPORARY loopback, no-auth, single-node KV server on the cluster's +# own JetStream store dir (not exposed; bus-auth off is allowed on 127.0.0.1). +./membershipd --store kv --bus-auth off --bind 127.0.0.1 \ + --nats-store ./local_files/jetstream --db ./seed.db >/tmp/seed-boot.log 2>&1 & +BOOT=$!; sleep 2 +# c) Migrate the admin from SQLite into the replicated KV (loopback — no --ca needed). +./membershipd migrate-to-kv --db ./seed.db --nats-url nats://127.0.0.1:4250 --replicas 1 +# d) Stop the bootstrap server. The KV buckets persist in ./local_files/jetstream. +kill "$BOOT"; wait "$BOOT" 2>/dev/null || true +rm -f ./seed.db +SEED +``` + +> The KV written here lives in `./local_files/jetstream`, which the cluster unit +> reuses (`--nats-store` default), so the admin is present when the enforce cluster +> starts. Additional users are added the same loopback way until a +> `user add --store kv` exists (see GAP in report 0009). + +## Bring up (HUMAN — staggered) + +Bring up the seed first, then the replicas one at a time, checking each joins. + +```bash +# 1. Seed node (after the seed step above). +ssh root@magnus 'systemctl enable --now membershipd-cluster' +ssh root@magnus 'curl -fsS https://127.0.0.1:8470/healthz --cacert /opt/unibus/tls/ca.crt' + +# 2. Replicas, one at a time. +ssh root@homer 'systemctl enable --now membershipd-cluster' +ssh root@datardos 'systemctl enable --now membershipd-cluster' +``` + +> Initial rollout runs at **R1** (`KV_REPLICAS=1` in `nodes.env`): the buckets live +> on the seed only. This is NOT HA yet — see "Scale to R3". + +## Promote an existing single-node (SQLite) deployment (HUMAN, optional) + +Instead of seeding fresh, you can migrate an existing single-node `unibus.db` into +the KV — **loopback only** (the allowlist would otherwise travel cleartext; the +command refuses a remote target without `--ca`). Use the same loopback-bootstrap +shape as the seed step (temporary `--bus-auth off` server on 127.0.0.1, then +`migrate-to-kv --db /opt/unibus/local_files/unibus.db`). + +## Verify + +```bash +# Posture on every node — all must be enforce+acl+tls+cluster, store=kv. +for h in magnus homer datardos; do + echo "== $h ==" + ssh root@$h 'curl -fsS https://127.0.0.1:8470/healthz --cacert /opt/unibus/tls/ca.crt' +done + +# Cluster + JetStream meta-group health (needs the `nats` CLI on a node): +ssh root@magnus 'nats --server nats://127.0.0.1:4250 server report jetstream' +ssh root@magnus 'nats --server nats://127.0.0.1:4250 server list' # 3 servers, routes up +``` + +A healthy cluster shows 3 routed servers and a JetStream meta-group with a leader. + +## Scale to R3 (HUMAN — real HA) + +Once all three nodes are up and routed, raise the replication factor of every +control-plane stream from 1 to 3 IN PLACE (no data loss), then flip `KV_REPLICAS=3` +in `nodes.env` so future (re)deploys keep it: + +```bash +for s in KV_UNIBUS_users KV_UNIBUS_rooms KV_UNIBUS_members KV_UNIBUS_room_keys \ + KV_UNIBUS_rooms_by_member KV_UNIBUS_nonces; do + ssh root@magnus "nats --server nats://127.0.0.1:4250 stream update $s --replicas 3 -f" +done +# (also OBJ_UNIBUS_blobs if the object store is in use) +``` + +Until this is done, R1 means the seed node is a **single point of failure for +authentication**: if it dies, the nonce/KV control plane is unreachable and every +authenticated request fails closed (auth DoS). R1 is a rollout step, not HA. + +## Chaos test (HUMAN — requires the 3 live VPS; NOT run here) + +Validate quorum tolerance after R3: + +```bash +# Kill one node; the cluster keeps serving (quorum 2/3). +ssh root@datardos 'systemctl stop membershipd-cluster' +# -> clients fail over (multiple seed URLs); reads/writes still succeed. +ssh root@datardos 'systemctl start membershipd-cluster' # rejoins, catches up + +# Kill two nodes; quorum is LOST — the control plane should fail CLOSED (deny), +# never fail open. Verify a request is rejected, not silently served. +``` + +This network-level chaos test (kill 1/3, kill 2/3, partition/split-brain) is part +of the deploy validation (issue 0003f) and runs against the real VPS — it is +deliberately out of scope for the authoring agent. + +## Rollback + +`membershipd` does not delete data. To revert a node to standalone SQLite, stop +the unit and start it without `--store kv`/`--cluster-name`; the KV buckets remain +for a later retry. To rotate the cluster CA, re-run `generate-cluster-certs.sh +--force` and re-stage (every node must get the new `cluster-ca.crt` together). diff --git a/deploy/cluster/deploy-cluster.sh b/deploy/cluster/deploy-cluster.sh new file mode 100755 index 0000000..46f583e --- /dev/null +++ b/deploy/cluster/deploy-cluster.sh @@ -0,0 +1,126 @@ +#!/usr/bin/env bash +# +# deploy-cluster.sh — cross-build membershipd and stage it onto the three cluster +# nodes (issue 0006g). DEFAULT IS DRY-RUN: it prints the plan and touches nothing. +# Pass --yes to actually rsync + run remote commands. Steps that a HUMAN must run +# (or confirm) are marked "HUMAN:". +# +# Prerequisites (HUMAN, once): +# 1. Fill nodes.env (no left). +# 2. ./generate-cluster-certs.sh (mints out// TLS material) +# 3. Create the route secret locally: mkdir -p secrets && openssl rand -hex 32 > secrets/cluster.pass +# (secrets/ is gitignored; it is rsynced to each node as cluster.pass) +# 4. SSH access to every node's SSH_HOST with sudo-less root (SSH_USER=root). +# +# What it does per node (with --yes): +# - rsync the membershipd binary, the node's TLS material, the unit, the +# generated cluster.env and the route secret into REMOTE_DIR. +# - install + daemon-reload the systemd unit. +# Start is STAGGERED and left to the human (see README): start the seed node, +# seed the admin, then start the rest. +set -euo pipefail + +DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +cd "$DIR" + +# shellcheck source=/dev/null +source ./nodes.env + +APPLY=0 +[[ "${1:-}" == "--yes" ]] && APPLY=1 + +if grep -q '<[A-Z_]\+>' nodes.env; then + echo "ERROR: nodes.env still has values — fill them in first." >&2 + exit 2 +fi + +SECRET_FILE="secrets/cluster.pass" +if [[ ! -f "$SECRET_FILE" ]]; then + echo "ERROR: $SECRET_FILE missing. HUMAN: mkdir -p secrets && openssl rand -hex 32 > $SECRET_FILE" >&2 + exit 2 +fi + +run() { + # Echo every action; only execute it under --yes. + echo " + $*" + if [[ $APPLY -eq 1 ]]; then + "$@" + fi +} + +echo "==> [1/3] cross-build membershipd (linux/amd64, CGO disabled)" +run env CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -o build/membershipd ../../cmd/membershipd + +# Build the comma-separated route list for a node = the OTHER nodes' addresses on +# the chosen network, with NO userinfo (the secret is injected by membershipd from +# the file). Echoes nothing; prints the value. +routes_for() { + local self="$1" out="" + local row name _ssh pub wg addr + for row in "${CLUSTER_NODES[@]}"; do + read -r name _ssh pub wg <<<"$row" + [[ "$name" == "$self" ]] && continue + if [[ "$ROUTE_NETWORK" == "public" ]]; then addr="$pub"; else addr="$wg"; fi + out+="nats://${addr}:${NATS_ROUTE_PORT}," + done + echo "${out%,}" +} + +echo "==> [2/3] stage each node (REMOTE_DIR=$REMOTE_DIR)" +for row in "${CLUSTER_NODES[@]}"; do + read -r name ssh _pub _wg <<<"$row" + target="${SSH_USER}@${ssh}" + nodedir="out/${name}" + if [[ ! -d "$nodedir" ]]; then + echo "ERROR: $nodedir missing — run ./generate-cluster-certs.sh first." >&2 + exit 2 + fi + routes="$(routes_for "$name")" + + echo "-- node ${name} (ssh ${ssh}) routes=${routes}" + + # Generate this node's cluster.env locally, then ship it. + envfile="build/cluster-${name}.env" + mkdir -p build + cat > "$envfile" < [3/3] staged." +if [[ $APPLY -eq 0 ]]; then + echo " DRY-RUN: nothing was sent. Re-run with --yes to apply." +fi +cat <<'NEXT' + +HUMAN — staggered start (do NOT enable all at once; see README "Bring up"): + 1. Seed node first (e.g. magnus): + ssh root@magnus 'systemctl enable --now membershipd-cluster' + ssh root@magnus '/opt/unibus/membershipd user add --admin ...' # seed admin + 2. Then the other two, one at a time, checking quorum after each: + ssh root@homer 'systemctl enable --now membershipd-cluster' + ssh root@datardos 'systemctl enable --now membershipd-cluster' + 3. Verify posture + quorum (README "Verify"). + 4. Scale replicas 1 -> 3 once all three are up (README "Scale to R3"). +NEXT diff --git a/deploy/cluster/generate-cluster-certs.sh b/deploy/cluster/generate-cluster-certs.sh new file mode 100755 index 0000000..d02efb6 --- /dev/null +++ b/deploy/cluster/generate-cluster-certs.sh @@ -0,0 +1,120 @@ +#!/usr/bin/env bash +# +# generate-cluster-certs.sh — mint the TLS material for a unibus 3-node cluster +# (issue 0006g). Run ONCE on a trusted machine (e.g. om, which custodies the bus +# CA); distribute the per-node output to each node over a secure channel. This +# script touches NO remote host. +# +# It produces two trust roots, kept SEPARATE on purpose (audit 0008 N1-low): +# +# 1. The CLUSTER route CA (cluster-ca.crt/key, generated here): signs each +# node's ROUTE certificate. The route layer authenticates NODES, not bus +# users, so it must NOT share the client data-plane CA — a client cert can +# then never be presented to the route port. +# 2. The CLIENT data-plane CA (../tls/ca.crt/key, the one clients pin): signs +# each node's DATA-PLANE server certificate. Reused, not regenerated, so +# existing clients keep trusting the bus. +# +# Per node it emits, under out//: +# route-.crt/key route cert (cluster CA), EKU server+clientAuth +# (each node is BOTH server and dialer to its peers) +# server-.crt/key data-plane cert (client CA), EKU serverAuth +# cluster-ca.crt the route CA cert (for --route-tls-ca) +# ca.crt the client CA cert (for clients / control-plane TLS) +# +# SANs per node = its public IP + its WireGuard IP + its hostname + localhost. +# +# Key material: EC P-256 (Go crypto/tls + nats-server friendly), matching +# ../tls/generate-certs.sh. +set -euo pipefail + +DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +cd "$DIR" + +# shellcheck source=/dev/null +source ./nodes.env + +# Refuse to run while any placeholder remains (HUMAN must fill nodes.env first). +if grep -q '<[A-Z_]\+>' nodes.env; then + echo "ERROR: nodes.env still has values — fill them in first." >&2 + grep -n '<[A-Z_]\+>' nodes.env >&2 + exit 2 +fi + +CLIENT_CA_CRT="../tls/ca.crt" +CLIENT_CA_KEY="../tls/ca.key" +if [[ ! -f "$CLIENT_CA_CRT" || ! -f "$CLIENT_CA_KEY" ]]; then + echo "ERROR: client data-plane CA not found at ../tls/ca.{crt,key}." >&2 + echo " Run ../tls/generate-certs.sh first (it mints the client CA)." >&2 + exit 2 +fi + +DAYS_CA=3650 +DAYS_CRT=825 + +force=0 +[[ "${1:-}" == "--force" ]] && force=1 + +# --- cluster route CA (separate trust root) --- +if [[ ! -f cluster-ca.crt || ! -f cluster-ca.key || $force -eq 1 ]]; then + echo "==> generating cluster route CA (separate from the client CA)" + openssl ecparam -name prime256v1 -genkey -noout -out cluster-ca.key + chmod 600 cluster-ca.key + openssl req -x509 -new -key cluster-ca.key -sha256 -days "$DAYS_CA" \ + -subj "/CN=unibus-cluster-ca" -out cluster-ca.crt +else + echo "==> reusing existing cluster route CA (pass --force to regenerate)" +fi + +# mint +mint_cert() { + local out_key="$1" out_crt="$2" cn="$3" san="$4" eku="$5" ca_crt="$6" ca_key="$7" + local csr ext + csr="$(mktemp)" + ext="$(mktemp)" + openssl ecparam -name prime256v1 -genkey -noout -out "$out_key" + chmod 600 "$out_key" + openssl req -new -key "$out_key" -subj "/CN=${cn}" -out "$csr" + cat > "$ext" < node ${name}: SAN IP:${pub}, IP:${wg}, DNS:${name}, localhost, 127.0.0.1" + nodedir="out/${name}" + mkdir -p "$nodedir" + san="IP:${pub},IP:${wg},DNS:${name},DNS:localhost,IP:127.0.0.1" + + # Route cert: signed by the cluster CA; server+client auth (mutual routes). + mint_cert "${nodedir}/route-${name}.key" "${nodedir}/route-${name}.crt" \ + "unibus-route-${name}" "$san" "serverAuth,clientAuth" \ + cluster-ca.crt cluster-ca.key + + # Data-plane server cert: signed by the client CA; serverAuth only. + mint_cert "${nodedir}/server-${name}.key" "${nodedir}/server-${name}.crt" \ + "unibus-${name}" "$san" "serverAuth" \ + "$CLIENT_CA_CRT" "$CLIENT_CA_KEY" + + # Co-locate the two CA certs each node needs. + cp cluster-ca.crt "${nodedir}/cluster-ca.crt" + cp "$CLIENT_CA_CRT" "${nodedir}/ca.crt" +done + +rm -f cluster-ca.srl ../tls/ca.srl 2>/dev/null || true + +echo +echo "==> done. Per-node material under out// (KEYS ARE SECRET — never git):" +for row in "${CLUSTER_NODES[@]}"; do + read -r name _rest <<<"$row" + echo " out/${name}/ (route-${name}.*, server-${name}.*, cluster-ca.crt, ca.crt)" +done +echo +echo "verify a SAN with:" +echo " openssl x509 -in out//server-.crt -noout -text | grep -A1 'Subject Alternative Name'" diff --git a/deploy/cluster/membershipd-cluster.service b/deploy/cluster/membershipd-cluster.service new file mode 100644 index 0000000..45ee329 --- /dev/null +++ b/deploy/cluster/membershipd-cluster.service @@ -0,0 +1,45 @@ +[Unit] +# unibus membershipd — cluster node (issue 0006g). +# +# One unit, parameterized per node by /opt/unibus/cluster.env (generated by +# deploy-cluster.sh): NODE_NAME, ROUTES and the cert paths differ per node, the +# rest of the posture (enforce + per-subject ACL + TLS + --store kv) is identical +# on every node, which is the homogeneous posture a secure cluster requires +# (audit 0008 N1). +Description=unibus membershipd (cluster node) +After=network-online.target +Wants=network-online.target + +[Service] +Type=simple +WorkingDirectory=/opt/unibus +EnvironmentFile=/opt/unibus/cluster.env +# The route password comes from a FILE referenced by ${CLUSTER_PASS_FILE}, never +# from argv (audit 0008 N1-low). The peer --routes carry no userinfo; membershipd +# injects the credentials from the file/user. +ExecStart=/opt/unibus/membershipd \ + --bind 0.0.0.0 \ + --bus-auth enforce \ + --http-port ${HTTP_PORT} \ + --nats-port ${NATS_CLIENT_PORT} \ + --tls-cert ${TLS_CERT} \ + --tls-key ${TLS_KEY} \ + --cluster-name ${CLUSTER_NAME} \ + --server-name ${NODE_NAME} \ + --cluster-port ${NATS_ROUTE_PORT} \ + --routes ${ROUTES} \ + --cluster-user ${CLUSTER_USER} \ + --cluster-pass-file ${CLUSTER_PASS_FILE} \ + --route-tls-cert ${ROUTE_TLS_CERT} \ + --route-tls-key ${ROUTE_TLS_KEY} \ + --route-tls-ca ${ROUTE_TLS_CA} \ + --store kv \ + --kv-replicas ${KV_REPLICAS} +# Restart=always (NOT on-failure): a clean SIGTERM exits success, and on-failure +# would then NOT restart, leaving the node silently dead (see function_tags.md). +Restart=always +RestartSec=2 +LimitNOFILE=65536 + +[Install] +WantedBy=multi-user.target diff --git a/deploy/cluster/nodes.env b/deploy/cluster/nodes.env new file mode 100644 index 0000000..4cdb8e0 --- /dev/null +++ b/deploy/cluster/nodes.env @@ -0,0 +1,44 @@ +# Cluster topology for the unibus 3-node deployment (issue 0006g). +# +# This file is SOURCED by generate-cluster-certs.sh and deploy-cluster.sh. +# +# HUMAN: fill in every with the real value before running the +# scripts. The public IPs known at authoring time are pre-filled; the WireGuard +# mesh IPs and magnus's public IP must be supplied. The scripts refuse to run +# while any remains. + +# Cluster identity (must be identical on every node). +CLUSTER_NAME="unibus" +# Route-secret username; the password is NOT here — it lives in a file (see +# CLUSTER_PASS_FILE in deploy-cluster.sh) so it never lands in argv or git. +CLUSTER_USER="unibus-cluster" + +# KV/nonce replication factor. START AT 1 for the initial 1->3 rollout, then raise +# to 3 IN PLACE (see README "Scale to R3") once all three nodes have joined. Only +# set this to 3 here after the third node is up and you re-run the KV update. +KV_REPLICAS=1 + +# Ports (same on every node; the route port is server-to-server only). +NATS_CLIENT_PORT=4250 +NATS_ROUTE_PORT=6250 +HTTP_PORT=8470 + +# Remote install layout and SSH login user. +REMOTE_DIR="/opt/unibus" +SSH_USER="root" + +# Which address family the inter-node routes use. "wg" builds --routes from the +# WireGuard mesh IPs (private server-to-server links, preferred); "public" uses +# the public IPs. The route layer is always mutual-TLS regardless. +ROUTE_NETWORK="wg" + +# One row per node: NAME SSH_HOST PUBLIC_IP WG_IP +# NAME -> --server-name and the per-node cert filenames (unique). +# SSH_HOST -> the `ssh ` alias (see ~/.ssh/config). +# PUBLIC_IP -> public address; goes in the cert SANs (client-facing data plane). +# WG_IP -> WireGuard mesh address; cert SAN + route target when ROUTE_NETWORK=wg. +CLUSTER_NODES=( + "magnus magnus " + "homer homer 141.94.69.66 " + "datardos dd 51.91.100.142 " +)