Bringing up the 3-node cluster from clean stores never converged: every node
looped on `open KV bucket "UNIBUS_rooms" (replicas=1): context deadline exceeded`.
Three independent defects in the clustered bootstrap path, none of which surface
on a single node (where JetStream is ready instantly), caused it:
1. embeddednats: route connection pooling (nats-server 2.10 default pool of 3)
churned with "duplicate route"/"client closed" reconnects on the small cluster,
interrupting the meta-group RAFT heartbeats and forcing perpetual leader
re-elections. Set Cluster.PoolSize = -1 (single route per peer).
2. embeddednats: the cluster nodes are Docker hosts, so NATS advertised the docker
bridge IPs (172.x / 10.0.x) to peers, which then tried to dial those private,
mutually-unreachable addresses. Set Cluster.NoAdvertise = true so only the
explicit public-IP routes are used. Also added a UNIBUS_NATS_DEBUG env toggle
(off by default) that enables the embedded server's logger and loopback
monitoring port for debugging the route/meta layer.
3. membership.OpenJetStream: a KV op is a NATS request/reply; on a cold cluster the
op was published once, before the node had contact with the meta leader, so the
request was dropped and the single long-context call just blocked until timeout.
Retry each bucket op with short per-attempt contexts until it succeeds or an
overall bootstrap budget (120s) is exhausted, so it lands once the meta settles.
With these the cluster forms cleanly, creates the KV buckets, scales R1->R3 in
place, and survives loss of one node (quorum 2/3). Verified on magnus+homer+datardos.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The one-time data move decentralization needs (issue 0003c): copy the
entire control-plane state from the local SQLite database into the
replicated JetStream KV buckets, with a backup taken first.
pkg/membership:
- Snapshot / SealedKeyRecord: a backend-agnostic dump of the whole
control plane (rooms with their real epoch, members, every sealed-key
row across epochs, users with status).
- (*sqliteStore).ExportSnapshot and (*jetstreamStore).ExportSnapshot read
a full Snapshot from each backend; (*jetstreamStore).importSnapshot
writes one with raw Puts (preserving epoch/status, not resetting to
defaults) so the migration is faithful and idempotent (every write is
an overwrite, so re-running converges).
- MigrateSQLiteToKV orchestrates export -> import; BackupSQLite makes a
consistent copy via SQLite's VACUUM INTO before any migration.
cmd/membershipd:
- `membershipd migrate-to-kv --db <path> --nats-url <url> [--replicas N]
[--ca <cert>] [--no-backup]` backs up the SQLite file, connects to the
cluster's NATS, and migrates. Dispatched on the host like `user`.
Tests (DoD: golden + edge + parity):
- TestMigrateSQLiteToKVParity: seed a representative SQLite (two rooms,
one rekeyed to epoch 2, members, a revoked user); after migration the
KV ExportSnapshot equals the SQLite ExportSnapshot.
- TestMigrateSQLiteToKVIdempotent: running the migration twice yields the
same KV state.
- TestBackupSQLiteCreatesConsistentCopy: the backup reopens with
identical data.
Plus a binary smoke (seed user -> run server -> migrate-to-kv -> re-run):
backup written, 1 user migrated, second run identical.
Branch-by-abstraction for the control-plane store (issue 0003b), so the
membership state can move off process-local SQLite onto replicated
JetStream KV without rewriting callers and without breaking master.
pkg/membership:
- Store is now an interface (rooms/members/keys + user allowlist +
Close). The existing SQLite implementation is renamed sqliteStore and
stays the default: Open(path) still returns it. openSQLite keeps the
concrete type for internal callers (the 0003c migration).
- ErrNotFound is a storage-agnostic "no such record" sentinel; both
backends return it (the SQLite store maps sql.ErrNoRows to it). The
control plane now branches on ErrNotFound instead of sql.ErrNoRows, so
server.go no longer imports database/sql.
- jetstreamStore (new) implements Store over five replicated KV buckets:
rooms, members, rooms_by_member (reverse index for ListRoomsForEndpoint),
room_keys, users. Replication factor is configurable (R1..R5) for the
R1->R3 rollout. Every read is bounded by OpTimeout and IsAuthorized /
HasAdmin FAIL CLOSED on any backend error (a KV quorum loss denies,
never admits), per the audit's requirement for the decentralized store.
dev/feature_flags.json:
- Add the `decentralized` flag (OFF): sqliteStore default while off,
jetstreamStore behind it. The membershipd boot wiring that selects the
KV store is deliberately deferred to 0003e/0003f (the embedded-NATS
authenticator<->store bootstrap is part of the session/deploy redesign);
OFF keeps the single-node SQLite control plane unchanged.
Tests (DoD: golden + edges + error path):
- TestJetStreamStoreRoomsCRUD: encrypted room + owner + invited member
round-trip through every room/member/key method, including latest-epoch
resolution and rekey.
- TestJetStreamStoreUsers: add/get/authorize/list/revoke + admin gate,
with case-insensitive key normalization and duplicate rejection.
- TestJetStreamStoreNotFound: ErrNotFound mapping for misses.
- TestJetStreamStoreIsAuthorizedFailClosed: NATS backend shut down ->
IsAuthorized and HasAdmin both DENY within the bounded timeout.
The full existing suite stays green: sqliteStore is unchanged behavior.