53 Commits

Author SHA1 Message Date
egutierrez b379730225 docs(app): document users HTTP admin model, bump 0.10.0
Add a gotcha describing the unified-storage model (the server writes
users to the same store/KV as rooms), the admin-only HTTP surface, and
the CLI-seeds-admin-#0 bootstrap. Bump the version 0.9.0 -> 0.10.0 and
add the capability growth log entry for the new HTTP admin users API.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 20:32:05 +02:00
egutierrez 450ca01baf feat(membership,client): HTTP admin-only users API
Close the last control-plane asymmetry: rooms had a signed HTTP surface
but users were only manageable via the local CLI or direct store access.
Add admin-only HTTP endpoints, symmetric with rooms, executed against the
same privileged store the server already serves (SQLite single-node, the
replicated JetStream KV in cluster) — no new KV connection, no internal
identity, so the admin panel can manage the allowlist by signing as an
admin instead of needing --db / direct KV access.

Endpoints (all behind requireAdmin, on top of the existing
signature+nonce+TLS+enforce middleware):
  - GET  /users                    list the full allowlist (incl. revoked)
  - POST /users                    add {sign_pub, handle, role}
  - POST /users/{signpub}/revoke   revoke (status flip, no hard delete)

requireAdmin is default-deny with no dev relaxation: it allows a request
only when the authenticated signer is confirmed by the store as an active
admin; any other case (no signer, non-admin, revoked, store error) is 403,
fail-closed. The request context now also carries the signer's sign_pub
hex, because the endpoint id is a one-way hash of the key and cannot be
reversed to look the signer up in the allowlist.

Validation/idempotency mirror the CLL: sign_pub must be 64-hex, role must
be admin|member (empty defaults to member), re-adding an existing key is a
409 that leaves the row untouched. The hex check is unified into
membership.ValidateSignPubHex, reused by the CLI and the handlers.

pkg/client gains ListUsers/AddUser/RevokeUser (flat UserInfo type) signed
via doJSON, so the panel plugs in directly.

Tests: non-admin -> 403 on all three endpoints; admin add->list->revoke
roundtrip; validation (400 hex, 400 role, 409 re-add, row untouched); plus
a client test against an embedded membershipd under enforce.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 20:31:57 +02:00
egutierrez e1a7402ff1 chore: bump unibus to 0.9.0 (live user-add + clientcheck)
New capability membershipd user add --store kv against a live cluster plus
cmd/clientcheck end-to-end verification (issue 0011 gaps, report 0012). Adds
the capability growth log entry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 19:41:56 +02:00
egutierrez ce72131ddf docs(cluster): correct runbook + wire --internal-id-file into deploy
Corrections learned from the real 0011 deploy:
- Bring up: the "start magnus alone and verify healthz" order deadlocks — a
  lone node of a 3-node cluster has no meta-group quorum and never serves
  healthz until a second node joins. Document a quorum-forming start and that
  a node never self-serves.
- Replication: R1 is an unusable SPOF (all six control-plane buckets on one
  node) and the cold start only converges with the three cold-start fixes;
  go straight to R3 once the cluster forms.
- Add a "user add --store kv" section: the live user-add path that replaces
  stop-seed-restart, with its security model and idempotency/HA/no-delete
  semantics.
- Topology: real IPs, ROUTE_NETWORK=public (no WireGuard mesh exists).
- Chaos test: mark the data-plane client + failover proofs as validated (0012).

Deploy machinery now emits the persisted internal identity: the unit gains
--internal-id-file ${INTERNAL_ID_FILE} and deploy-cluster.sh writes
INTERNAL_ID_FILE into each node's cluster.env, so a fresh deploy enables the
live user-add path on every node.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 19:41:56 +02:00
egutierrez 3aa5a2c9a9 feat(clientcheck): end-to-end client verification (E2E room + failover)
The 0011 chaos test validated only the control plane (healthz + leader
failover + KV readable with 2/3); it never connected an authenticated bus
client to the data plane. cmd/clientcheck is a reusable verification tool: it
connects with a real identity (nkey + TLS on both planes, multi-node seed
lists), creates an ephemeral E2E room (encrypted + signed, no durable stream),
and either publishes N messages and asserts all come back decrypted (golden)
or publishes a counter for a duration while logging the attached node (loop),
so stopping a node mid-run shows the client fail over to a survivor and keep
receiving with quorum 2/3.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 19:41:56 +02:00
egutierrez 02c2004ebd feat(membershipd): user add/list/revoke --store kv against a live cluster
Closes the most valuable 0011 deploy gap: adding users to the running
cluster's replicated allowlist with no stop-seed-restart. Under enforce the
per-subject ACL confines every bus user to its own rooms, so no ordinary
identity may write the control-plane KV buckets; the only identity the
authenticator grants full JetStream permissions is membershipd's internal
service identity.

- main.go: --internal-id-file persists that identity (load-or-create, 0600)
  instead of a fresh ephemeral key, so the same nkey is available out of
  process. Empty keeps the ephemeral default (single-node/dev unchanged).
- users_kv.go: connectKVStore loads the persisted identity, presents its
  nkey (recognized as internal -> full perms), opens the KV store and
  writes. Defaults assume an on-node loopback invocation; a remote target
  without --ca is refused (allowlist must not travel cleartext, audit N6).
  Prints KV_UNIBUS_users replication (followers_current) after a write.
- users_cli.go: --store kv on add/list/revoke. Re-adding a key is an explicit
  ErrUserExists (no silent overwrite / role flip); revoke is a status flip.
- pkg/client: LoadIdentity (load-only) extracted from LoadOrCreateIdentity,
  preserving its "corrupt file is an error, not silently regenerated" guard.
- kv_useradd_test.go: golden write under enforce, idempotency, unreachable
  endpoint, and remote-without-CA refusal against an embedded node.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 19:41:38 +02:00
egutierrez ff580ac031 Merge quick/cluster-coldstart-fixes: 3-node cluster cold-start fixes + real topology 2026-06-07 18:56:28 +02:00
egutierrez 9fbff79df4 chore(deploy): fill cluster nodes.env with the real 3-node topology
Set magnus's public IP (135.125.201.30) and switch ROUTE_NETWORK to "public":
the three nodes have no WireGuard mesh (homer/datardos do not even have wg
installed), so server-to-server routes go over the public IPs, still protected
by the separate cluster route CA (mutual TLS). KV_REPLICAS is raised to 3 now
that the cluster runs at R3.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 18:56:28 +02:00
egutierrez 33746d9962 fix(cluster): make the JetStream control-plane survive a cold multi-node start
Bringing up the 3-node cluster from clean stores never converged: every node
looped on `open KV bucket "UNIBUS_rooms" (replicas=1): context deadline exceeded`.
Three independent defects in the clustered bootstrap path, none of which surface
on a single node (where JetStream is ready instantly), caused it:

1. embeddednats: route connection pooling (nats-server 2.10 default pool of 3)
   churned with "duplicate route"/"client closed" reconnects on the small cluster,
   interrupting the meta-group RAFT heartbeats and forcing perpetual leader
   re-elections. Set Cluster.PoolSize = -1 (single route per peer).

2. embeddednats: the cluster nodes are Docker hosts, so NATS advertised the docker
   bridge IPs (172.x / 10.0.x) to peers, which then tried to dial those private,
   mutually-unreachable addresses. Set Cluster.NoAdvertise = true so only the
   explicit public-IP routes are used. Also added a UNIBUS_NATS_DEBUG env toggle
   (off by default) that enables the embedded server's logger and loopback
   monitoring port for debugging the route/meta layer.

3. membership.OpenJetStream: a KV op is a NATS request/reply; on a cold cluster the
   op was published once, before the node had contact with the meta leader, so the
   request was dropped and the single long-context call just blocked until timeout.
   Retry each bucket op with short per-attempt contexts until it succeeds or an
   overall bootstrap budget (120s) is exhausted, so it lands once the meta settles.

With these the cluster forms cleanly, creates the KV buckets, scales R1->R3 in
place, and survives loss of one node (quorum 2/3). Verified on magnus+homer+datardos.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 18:56:28 +02:00
agent caf005f04b feat(web): frontend v1 — login (handle+contraseña), sidebar rooms+buscador, chat estilo Element
SPA React 19 + Vite + Mantine v9 en modo oscuro (acento índigo), datos mock para
iterar el diseño antes de cablear el gateway. Login con identidad + contraseña
(la contraseña desbloqueará la identidad Ed25519 cifrada en el dispositivo).
Sidebar: avatar de usuario, buscador (rooms/usuarios/mensajes) y lista de rooms
con candado E2E / hash cleartext / badges de no leídos. Panel de chat estilo
Element (avatar+nombre+hora+texto) con composer interactivo.
2026-06-07 17:57:50 +02:00
agent 9787c218ac chore: remove experimental frontends (web, android, playground, mobile)
Limpieza de los frontends de prueba (SPA React, app Kotlin, gateway playground,
binding gomobile) tras la fase de exploración. El bus (cmd/membershipd + pkg/*)
queda intacto y verde. Empezamos un frontend web nuevo desde cero, construido
de forma incremental. Todo lo borrado permanece en el historial git por si hay
que recuperar algo.
2026-06-07 17:38:07 +02:00
egutierrez 926b8e96af chore(0006): bump unibus to 0.8.0, close issue 0006 (cluster hardening + wiring)
All seven phases (0006a–0006g) merged: blockers N3 (replicated nonce) and N2
($JS.API.> KV leak) closed, decentralized KV store wired (--store kv), homogeneous
cluster posture enforced (N1), RefreshSession in all clients (N4), the lows
(secret out of argv, migrate guard, R1/CA docs), and the 3-node deploy material.

Full suite + every audit-0008 attack regression green; govulncheck 0 reachable.
See report 0009.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 17:33:03 +02:00
egutierrez ae39e35fb4 Merge issue/0006g-deploy: cluster deploy material (magnus+homer+datardos, R3 HA) 2026-06-07 17:31:13 +02:00
egutierrez 48a3d6be33 docs(0006g): cluster deploy material for magnus+homer+datardos (R3 HA)
Parameterized, NO-VPS-touched material to bring up unibus as a 3-node cluster.
The authoring agent ran none of it on a host; every remote-changing step is
marked HUMAN and deploy-cluster.sh defaults to a dry run.

deploy/cluster/:
- nodes.env — topology (cluster name, ports, per-node rows). Public IPs known
  (homer 141.94.69.66, datardos 51.91.100.142) pre-filled; magnus public IP and
  all WireGuard IPs are <PLACEHOLDER> for the human; scripts refuse to run while
  any remain.
- generate-cluster-certs.sh — mints a SEPARATE cluster route CA + a route cert per
  node (server+clientAuth, mutual routes) and a data-plane server cert per node
  signed by the reused client CA (../tls/ca.*); SAN = public + WG + hostname.
- membershipd-cluster.service — one unit, parameterized per node via
  /opt/unibus/cluster.env: enforce + per-subject ACL + TLS + --store kv,
  --cluster-pass-file (secret out of argv), Restart=always.
- deploy-cluster.sh — cross-build linux/amd64, generate each node's cluster.env
  (routes to the other two on the WG mesh, no userinfo), rsync + install (only
  with --yes); staggered start is manual.
- README.md — runbook: prerequisites, loopback bootstrap to seed the first admin
  into the KV (works around the user-CLI/KV chicken-and-egg), staggered bring-up,
  verify posture+quorum, scale R1->R3 in place, and the chaos test (left to 0003f
  on the real VPS).
- .gitignore — out/, build/, secrets/, *.key never committed.

bash -n passes on both scripts; go build/test unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 17:31:13 +02:00
egutierrez 24ff45ca7e Merge issue/0006f-lows: cluster secret out of argv + migrate guard + docs (audit 0008 lows) 2026-06-07 17:24:46 +02:00
egutierrez b8201a82cd fix(0006f): cluster secret out of argv, migrate-to-kv TLS guard, R1/CA docs (audit 0008 lows)
Low-severity cluster hardening from audit 0008:

- Route secret out of argv (N1-low): --cluster-pass and a nats://user:pass@host in
  --routes are visible in ps/journald. New --cluster-pass-file and the
  UNIBUS_CLUSTER_PASS env var (precedence file > env > flag); the resolved secret
  guards the route layer and is injected into bare --routes entries
  (injectRouteCreds), so peers can be listed as nats://host:6250 with no secret in
  argv. The legacy --cluster-pass stays for dev/compat.
- migrate-to-kv confidentiality (N6): refuse a remote --nats-url without --ca (the
  allowlist would travel cleartext); loopback targets are exempt (isLoopbackURL).
- Docs (N1 route CA, N3 DoS): deploy/README gains a Clustering section — use a
  SEPARATE cluster CA for routes (not the client CA), keep the secret out of argv,
  run migrate-to-kv loopback/TLS only, and R1 is a SPOF of auth (not HA); R3
  quorum is real HA. The generated cert material lives in deploy/cluster/ (0006g).

Tests:
- TestResolveClusterPass (file > env > flag precedence; missing file errors),
- TestInjectRouteCreds (injects only into userinfo-less routes; preserves overrides),
- TestIsLoopbackURL (loopback vs remote vs malformed).

CGO_ENABLED=0 go build/vet/test green; govulncheck 0 reachable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 17:24:46 +02:00
egutierrez 3a33656cac Merge issue/0006e-refresh: RefreshSession in all clients (audit 0008 N4) 2026-06-07 17:21:14 +02:00
egutierrez 2f5b372a80 fix(0006e): call RefreshSession after membership changes in all clients (audit 0008 N4)
A secured bus freezes per-subject permissions at connect time, so a peer that
creates or joins a room after connecting cannot pub/sub on it until it reconnects
(RefreshSession). No client called it, so under enforce+ACL the demos failed
closed — pushing the operator to disable the ACL (a security regression at the
operator's discretion).

Wire the membership-change contract into every client:
- cmd/worker: RefreshSession after CreateRoom, before publishing.
- cmd/chat (simple): RefreshSession after CreateRoom+Join, before Subscribe.
- cmd/chat (encrypted demo): A refreshes after CreateRoom; B refreshes after the
  invite+join, both before pub/sub.
- local_files/bridge (gateway): RefreshSession after CreateRoom+Join, before Subscribe.
- mobile: new Session.RefreshSession wrapper + the contract documented for callers.

Contract (documented on the wrappers): after ANY membership change, call
RefreshSession BEFORE pub/sub on the new room (it drops active subs, so it must
precede Subscribe). On an unsecured/dev bus it is a harmless reconnect.

Test:
- TestClientCreateRoomRefreshPublishFlow: end-to-end under enforce+ACL, a peer
  creates a room, refreshes, invites a second peer who joins+refreshes+subscribes,
  and the publish is received — no manual intervention, the ACL stays on.

CGO_ENABLED=0 go build/vet/test green; govulncheck 0 reachable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 17:21:14 +02:00
egutierrez 32bec75665 Merge issue/0006d-posture: homogeneous cluster posture + /healthz posture (audit 0008 N1) 2026-06-07 17:17:37 +02:00
egutierrez 9b96537aa6 fix(0006d): enforce homogeneous cluster posture + publish posture on /healthz (audit 0008 N1)
A cluster is only as secure as its weakest node: the data plane forwards every
subject between nodes, so one node running without enforced auth lets an
unauthenticated peer Subscribe(">") on it and harvest the traffic forwarded from
the ACL'd nodes.

- validateClusterConfig now takes the auth mode and REFUSES to join a cluster
  unless --bus-auth enforce, regardless of bind (a clustered node is a production
  node; there is no safe dev cluster without auth). This binary therefore cannot
  BE the weak node.
- Server.Posture {enforce,acl,tls,cluster,store} is published on /healthz (non
  secret operational metadata, probe stays unauthenticated) so a monitor or peer
  can detect a cluster member not running enforce+ACL+TLS — covering a peer that
  runs a tampered/old binary outside this node's control.

Tests:
- TestAttack0008_N1: a clustered node with --bus-auth off is refused; the same
  node with enforce + full route security is allowed.
- TestClusterConfigPolicy: extended with off/soft clustered cases (refused) and
  the mode parameter throughout.
- TestHealthExposesPosture: /healthz returns the posture booleans + store backend.

CGO_ENABLED=0 go build/vet/test green; govulncheck 0 reachable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 17:17:37 +02:00
egutierrez 18ee7c469b Merge issue/0006c-kv-store: wire decentralized control-plane KV store (--store kv) 2026-06-07 17:14:20 +02:00
egutierrez e9ad719424 feat(0006c): wire the decentralized control-plane KV store (--store kv)
0003 built the JetStream KV store (jetstreamStore) but the binary never selected
it: membership.Open (SQLite) was hardcoded and OpenJetStream was only reached by
migrate-to-kv. This completes the wiring so a node actually serves its control
plane from the replicated KV.

- New flag --store kv|sqlite (default sqlite). kv opens the JetStream KV control
  plane over the privileged internal connection; sqlite is the unchanged baseline
  (branch-by-abstraction: the full suite's SQLite paths are untouched).
- Bootstrap cycle resolved with storeHolder: the authenticator consults the holder
  (fail-closed until set), so it can be built before the KV store exists. The KV
  store opens after NATS is up and is published into the holder. The only client
  that can connect in that window is the internal identity, which bypasses the
  store by key. In SQLite mode the store is set before StartServer, so the window
  does not exist.
- needJS now covers --store kv as well as --cluster-name; the JetStream client is
  shared by the KV store and the replicated nonce bucket.
- feature_flags.json: decentralized wiring documented as complete, realized via
  --store kv (opt-in per deploy; default stays sqlite).

Fail-closed preserved: jetstreamStore.IsAuthorized already denies on any backend
error; the holder denies while unset.

Tests:
- TestStoreHolderFailClosed: empty holder denies; serves after set.
- TestKVStoreBootstrapUnderEnforce: end-to-end decentralized boot — KV-seeded user
  authenticates over nkey under enforce; outsider denied.
- TestKVStoreDecentralizedConsistency: a room/user created on one node's KV store
  is visible to another's (ends the per-node SQLite divergence, audit 0008 N5).

CGO_ENABLED=0 go build/vet/test green; govulncheck 0 reachable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 17:14:20 +02:00
egutierrez d1e1a478f8 Merge issue/0006b-kv-acl: scope JetStream ACL per-room (audit 0008 N2) 2026-06-07 17:08:54 +02:00
egutierrez cacf608fde fix(0006b): scope JetStream ACL per-room, close $JS.API.> KV leak (audit 0008 N2)
The client-infra grant was {"_INBOX.>", "$JS.API.>"}. The broad "$JS.API.>" let
any registered peer drive the whole JetStream API and read the control-plane KV
buckets (KV_UNIBUS_users/rooms/members/room_keys) and the object store directly
over NATS, bypassing the HTTP authorization (requireMember + own-endpoint
checks): a full leak of the allowlist, room graph and sealed-key metadata once the
decentralized control plane is active.

Fix: replace the broad grant with a CLOSED, per-room allow set.
- clientInfraSubjects shrinks to {"_INBOX.>", "$JS.API.INFO"} ($JS.API.INFO is
  account counters only — no room/user/key contents).
- SubjectACLFor now grants, per room the peer belongs to, the room subject plus
  the minimal JetStream API subjects of THAT room's stream (jsSubjectsFor:
  STREAM.*, CONSUMER.*, $JS.ACK scoped to UNIBUS_<roomID>).
- Because KV_UNIBUS_* and OBJ_UNIBUS_* are never a room stream, they fall outside
  the closed allow set and are denied by default. Clients reach blobs over the
  HTTP control plane, not the NATS object store, so OBJ needs no client grant.

roomStreamName mirrors pkg/client.streamName so the authorizer and the producer
never drift.

Tests:
- TestAttack0008_N2: eve (registered, member of no room) cannot bind the KV users
  bucket nor subscribe $KV.UNIBUS_users.> (permissions violation); golden: the
  room owner can still drive her OWN room stream's JetStream API; edge: eve cannot
  reach a foreign room's stream.
- TestReaudit_H4 residual note updated: the $JS.API.> leak it deferred is closed.

CGO_ENABLED=0 go build/vet/test green; govulncheck 0 reachable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 17:08:54 +02:00
egutierrez a9c245d468 Merge issue/0006a-replicated-nonce: wire replicated nonce store (audit 0008 N3) 2026-06-07 17:02:19 +02:00
egutierrez 8b6a01d280 fix(0006a): wire replicated nonce store on clustered nodes (audit 0008 N3)
membershipd never called Server.UseReplicatedNonces, so every node kept a
per-process anti-replay cache and a signed request accepted on node A could be
replayed to node B (200+200). This wires the shared JetStream KV nonce bucket on
any clustered node, closing the cross-node replay hole.

Bootstrap: under enforce the service needs JetStream on its own embedded server,
but the data plane only accepts allowlisted clients. Resolved with an ephemeral
internal service identity the authenticator recognizes and grants full
permissions (NewNkeyAuthenticatorACLInternal), connected over the in-process
transport (no TLS/CA needed for the self-connection).

Hard rule: --cluster-name != "" means the replicated nonce bucket is mandatory;
if it cannot be created the node refuses to start (wireReplicatedNonces returns a
fatal error) rather than run insecurely. Standalone nodes keep the in-memory
cache unchanged (branch-by-abstraction: no JetStream dependency added).

Changes:
- busauth: NewNkeyAuthenticatorACLInternal + fullPermissions for the internal id.
- cmd/membershipd: connectInternalJS (in-process, privileged) / connectExternalJS;
  wireReplicatedNonces helper; main wires it when clustered; --kv-replicas flag.

Tests (regression of audit 0008 N3):
- TestAttack0008_N3: 2 clustered nodes share the bucket, cross-node replay -> 401.
- TestAttack0008_N3_StandaloneKeepsLocalCache: standalone needs no JetStream,
  same-node replay still 401.
- TestAttack0008_N3_ClusteredRequiresJetStream: clustered + no JetStream -> fatal.
- TestInternalConnPrivilegedUnderEnforce / ...OutsiderRejected: the privileged
  self-connection works under enforce and no other identity can claim it.

CGO_ENABLED=0 go build/vet/test green; govulncheck 0 reachable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 17:02:19 +02:00
agent 5df99fa4c4 docs(issue): 0006 completar+endurecer cluster — wiring KV + N1-N6 auditoría 0008 + material deploy magnus/homer/datardos 2026-06-07 16:48:07 +02:00
egutierrez df3b62a601 Merge quick/0005-bump-close: unibus 0.7.0 + close issue 0005 2026-06-07 16:17:41 +02:00
egutierrez 6976537842 chore(0005): bump unibus to 0.7.0, close issue 0005 (hardening 2)
Hardening 2 (issue 0005, fases 0005a-0005e) cierra los hallazgos nuevos de la
re-auditoría red-team (report 0006): bump de nats-server + toolchain (16 CVEs ->
0 alcanzables), drop de frames sin firma en rooms SignMsgs, limiter global de
bytes en vuelo contra el DoS por concurrencia, TLS obligatorio en bind publico, y
cableado de la ACL por subject que cierra el wildcard metadata leak. Detalle por
fase en el capability growth log del app.md y en el report 0007.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 16:17:41 +02:00
egutierrez a4bbe8209b Merge issue/0005e-acl-wire: wire per-subject ACL into membershipd (audit H4) 2026-06-07 16:15:52 +02:00
egutierrez 87ef52cc80 fix(0005e): wire per-subject ACL into membershipd (close H4 wildcard metadata leak)
The per-subject data-plane ACL existed since 0003e (membership.SubjectACLFor +
busauth.NewNkeyAuthenticatorACL, unit-tested in TestSubjectACLIsolation) but the
binary never used it: cmd/membershipd installed the plain NewNkeyAuthenticator, so
in production a registered NON-member could open a raw NATS connection,
Subscribe(">"), and harvest every room's subject plus JetStream stream/advisory
activity (payload stayed E2E ciphertext, metadata leaked) — the re-audit's H4
vector (report 0006).

Fix:
- New busauth.PermissionsFromSubjects adapts a subject-deriving function into the
  PermissionsFunc the ACL authenticator expects (subjects granted as both the
  publish and subscribe allow set; a derivation error fails closed). It lives in
  busauth so membership stays free of the nats-server dependency.
- cmd/membershipd, under enforce, now installs
  NewNkeyAuthenticatorACL(store.IsAuthorized,
    PermissionsFromSubjects(membership.SubjectACLFor(store)))
  so every connection is confined to the subjects of the rooms it belongs to plus
  the client-infra subjects.
- pkg/membership/acl_test.go's helper now delegates to the production wiring
  (PermissionsFromSubjects) instead of a test-only reimplementation, so the tests
  exercise the real path.

Verification (pkg/membership/acl_test.go):
- TestReaudit_H4_WildcardMetadataLeak: a non-member's Subscribe(">") and any
  foreign-subject subscribe raise permission violations; the member still pub/subs
  her own room and the non-member captures nothing. With the plain authenticator
  (the pre-0005e wiring) the test fails ("wildcard metadata leak still open"),
  confirming the wiring is what closes it.
- TestSubjectACLIsolation / TestRefreshSessionGainsNewRoom still green.
- CGO_ENABLED=0 go build ./... && go vet ./... && go test -count=1 ./...  green.

Residual (documented): the client-infra grant includes "$JS.API.>", shared by all
peers so per-connection JetStream works; a peer that subscribes specifically to
"$JS.API.>" can still observe stream-management requests whose subjects embed the
room-derived stream name. Fully closing that needs NATS accounts/permissions per
identity (deferred to the 0003 decentralization line). Operational note: NATS
freezes permissions at connect time, so clients must client.RefreshSession after a
membership change to gain a new room's subject; cmd/chat and cmd/worker do not yet
call it, a functional gap to close before an enforce+ACL deployment.

Refs: report 0006 H4, issue 0005e.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 16:15:52 +02:00
egutierrez a2ec78c81d Merge issue/0005d-tls-guard: require TLS on public bind (audit N4) 2026-06-07 16:11:45 +02:00
egutierrez d01da9d396 fix(0005d): require TLS on a public bind (close N4 plaintext control plane)
The H2 guard refused "public bind without enforce" and "TLS flags without
enforce", but it still ALLOWED a public bind with enforce and no --tls-cert: the
control plane then served metadata (subjects, pubkeys, sealed keys, the social
graph) over plaintext HTTP publicly, so audit H5 reappeared as the N4 gap (TLS
was a capability, not a requirement; report 0006).

Fix: validateBootConfig now also refuses a non-loopback --bind unless both
--tls-cert and --tls-key are set. Public deployments must serve HTTPS; loopback
dev is unaffected (no TLS still allowed there).

Verification (cmd/membershipd/config_test.go):
- TestGap_PublicEnforceNoTLS: validateBootConfig("0.0.0.0", enforce, "", "")
  now returns an error mentioning --tls-cert (golden public+enforce+TLS allowed;
  edge loopback-without-TLS still allowed).
- TestBootConfigPolicy table updated: public+enforce+notls / +certonly / +keyonly
  and lan-ip+enforce+notls are now refused; public+enforce+tls and
  loopback+enforce+tls allowed.
- CGO_ENABLED=0 go build ./... && go vet ./... && go test -count=1 ./...  green.

Refs: report 0006 N4, issue 0005d.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 16:11:45 +02:00
egutierrez db8618ddc3 Merge issue/0005c-inflight: global in-flight byte limiter bounds aggregate memory (audit N2) 2026-06-07 16:09:58 +02:00
egutierrez e7d59fd01d fix(0005c): bound aggregate buffered memory with a global in-flight byte limiter
The H1 fix bounds each request (1 MiB control / 16 MiB blob) and the per-IP rate
limiter throttles a single source, but neither bounds the AGGREGATE memory across
concurrent requests. The re-audit (report 0006, N2) drove RSS to ~1.42 GB with 40
concurrent 16 MiB uploads, and noted that a multi-IP (botnet) flood scales without
a ceiling because the rate limit is per-IP.

Fix: a global, non-blocking, byte-counting limiter (pkg/membership/inflight.go).
ServeHTTP reserves a POST's worst-case buffered size (its route ceiling) from the
limiter before reading the body, and releases it when the request finishes. When
the global cap (maxInflightBytes = 128 MiB) is reached, further POSTs are shed
with 503 (backpressure) rather than parking goroutines, so total bytes buffered
in flight stays bounded regardless of connection count or source-IP spread. GETs
carry no body and do not consume the budget.

The limiter is implemented inside unibus (not delegated to the fn-registry, where
a generic concurrency primitive would normally live) because functions/core pulls
transitive deps requiring CGO (mattn/go-sqlite3) and external modules that are
incompatible with unibus's CGO_ENABLED=0 build, and because this work is scoped
to the unibus sub-repo. The type/method comments document this.

Verification:
- pkg/membership/inflight_test.go: TestInflightLimiter{Basics,Disabled,Concurrent}
  cover golden/edge/error/disabled/over-release and a -race concurrency invariant
  (inFlight returns to 0, never exceeds cap).
- pkg/membership/dos_concurrency_test.go: TestReaudit_DoSConcurrency fires 40
  concurrent 16 MiB uploads from distinct IPs (the multi-IP shape) against a 48 MiB
  test cap -> 200=3 503=37, RSS delta ~93 MiB (bound 256 MiB), inFlight()==0, and a
  fresh upload still 200. With the limiter disabled the test fails (200=40 503=0),
  confirming it is a real regression guard.
- CGO_ENABLED=0 go build ./... && go vet ./... && go test -count=1 ./...  green;
  CGO_ENABLED=1 go test -race ./pkg/membership/ green.

Residual (documented): under enforce the body is buffered twice (auth verify +
handler), so real RSS is ~2x the reserved bytes; closing that fully means
streaming blobs to disk (overlaps H9 / issue 0002).

Refs: report 0006 N2, issue 0005c.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 16:09:58 +02:00
egutierrez 0f79708338 Merge issue/0005b-sig-nil: drop unsigned frames in SignMsgs rooms (audit N3) 2026-06-07 15:58:10 +02:00
egutierrez ef3af6dfd1 fix(0005b): drop unsigned frames in SignMsgs rooms (close sig-nil spoof)
client.processFrame verified a frame's signature only when one was present
(`info.Policy.SignMsgs && f.Sig != nil`). In a room whose policy REQUIRES
per-message signatures, an attacker with data-plane access could publish a raw
frame with Sig==nil and a forged Sender, and the receiver accepted it as
authentic because the verification block was skipped (audit N3, report 0006).
On a signed-but-cleartext room any peer that knows the subject could thus
impersonate any sender.

Fix: in a SignMsgs room a missing signature is itself a rejection. processFrame
now drops any frame with Sig==nil before attempting verification:

    if info.Policy.SignMsgs {
        if f.Sig == nil { return }   // signature required but absent: drop
        // verify ...
    }

Non-signed rooms (ModeNATS) are unaffected: unsigned frames there are still
delivered, so the plain-NATS path is unchanged.

Verification (pkg/client/sig_nil_spoof_test.go, TestReaudit_SigNilSpoof):
- golden: a properly signed frame from a member is delivered.
- error : an unsigned frame with a forged Sender in a SignMsgs room is dropped
  (the test fails with "SIG-NIL SPOOF: receiver accepted ..." when the fix is
  reverted, confirming it is a real regression guard).
- edge  : a non-signed room still delivers an unsigned frame.
- CGO_ENABLED=0 go build ./... && go vet ./... && go test -count=1 ./...  green.

Refs: report 0006 N3, issue 0005b.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 15:58:10 +02:00
egutierrez 88b47912bd Merge issue/0005a-cve-bump: bump nats-server to v2.11.15 + go1.26.4 (16 CVEs -> 0 reachable) 2026-06-07 15:55:32 +02:00
egutierrez a3ac58fb70 fix(0005a): bump nats-server v2.10.22->v2.11.15 + toolchain go1.26.4 (close 16 CVEs)
govulncheck reported 16 reachable vulnerabilities (re-audit finding N1, report 0006):
14 in github.com/nats-io/nats-server/v2@v2.10.22 -- the embedded NATS server, which
is exposed to the internet in the chosen deployment -- and 2 in the Go standard
library (GO-2026-5039 net/textproto, GO-2026-5037 crypto/x509).

Changes:
- go get github.com/nats-io/nats-server/v2@v2.11.15 (covers all 14 server CVEs;
  pulls nats.go v1.49.0, nkeys v0.4.15, jwt v2.8.1, klauspost/compress v1.18.4
  and friends transitively).
- go directive 1.25.0 -> 1.26.4 so the toolchain ships the two stdlib fixes.

This is a go.mod/go.sum change justified purely by CVE remediation; it is the
explicit exception to the "do not touch deps" rule for a CVE bump.

Verification:
- CGO_ENABLED=0 go build ./... && go vet ./... && go test -count=1 ./...  -> green,
  including the 0003 multi-node cluster/JetStream e2e in pkg/embeddednats, so the
  server bump did not break the cluster or the durable plane.
- govulncheck ./...  -> "No vulnerabilities found" (0 reachable; the 13 that remain
  are in required-but-not-called modules).

Refs: report 0006 N1, issue 0005a.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 15:55:32 +02:00
agent fb0291ad8a docs(issue): 0005 hardening 2 — CVEs, sig-nil spoof, DoS concurrencia, TLS forzado (re-auditoría 0006) 2026-06-07 15:48:49 +02:00
agent d821bc1794 chore(0003): bump unibus to 0.6.0 — decentralization / HA (0003a-0003e)
Cluster NATS routes (auth + mutual TLS), Store/blobstore interfaces with
replicated JetStream KV and Object Store backends, idempotent
migrate-to-kv with backup, client failover over seed/control-plane lists,
replicated nonce store (closes the multi-node replay hole), and the
per-subject membership ACL (audit H4 residual). All behind the
`decentralized` flag (off); single-node SQLite+disk behavior unchanged.
The multi-node deploy (0003f) is the human's; runbook in report 0006.
2026-06-07 15:31:14 +02:00
egutierrez da420513b6 Merge issue/0003e-client-failover: client failover + replicated nonce store + subject ACL (H4) 2026-06-07 15:27:45 +02:00
agent 96abb75a2e feat(0003e/3): per-subject data-plane ACL from room membership (audit H4)
Closes the residual the 0004 hardening deferred: the NATS authenticator
can now confine a registered peer to the subjects of the rooms it
belongs to, instead of letting any registered identity sub/pub on any
subject. The dynamic-membership reconnection model the audit named is
provided by client.RefreshSession.

pkg/busauth:
- verifyNkey factors out the shared nkey verification.
- NewNkeyAuthenticatorACL + PermissionsFunc: an authenticator that, after
  authorizing, derives and RegisterUser()s per-subject permissions. A
  derivation error denies the connection (fail closed).

pkg/membership:
- SubjectACLFor(store) maps a signing pubkey to the subjects it may use:
  the subject of every room it belongs to, plus the client infrastructure
  subjects (_INBOX.>, $JS.API.> for request/reply and the persisted plane).

pkg/client:
- RefreshSession() rebuilds the data-plane connection so the authenticator
  re-derives permissions after a membership change (NATS freezes
  permissions at connect time). It retains the seeds/options to reconnect;
  active subscriptions are dropped and must be re-made (documented).

Tests (DoD: isolation + refresh):
- TestSubjectACLIsolation: alice (member of room.A) may sub/pub room.A but
  is DENIED sub and pub on room.B (permissions violation), and never reads
  bob's room.B traffic; bob never receives alice's cross-room publish.
- TestRefreshSessionGainsNewRoom: alice has no permission for room B until
  she is added and calls RefreshSession; the reconnect grants the subject
  and she then receives room B traffic.

Scope note: the per-subject ACL authenticator is opt-in (NewServer/
membershipd keep the open authenticator by default) and is wired in with
the decentralized boot path; auto-RefreshSession on every membership
change (fully transparent) remains for 0003f. Master behavior unchanged.
2026-06-07 15:27:45 +02:00
agent 37c778ca9a feat(0003e/2): replicated anti-replay nonce store on JetStream KV
The per-process nonce cache breaks anti-replay under multi-node failover
(audit 0004): a request captured on one node can be replayed to a
DIFFERENT node whose local cache never saw the nonce, and is accepted.
This makes the nonce state shared so a replay is rejected cluster-wide.

pkg/membership:
- nonceStore is now an interface. The in-memory cache is renamed
  memNonceCache (still the default, single-node behavior).
- kvNonceStore (new) claims each nonce with an atomic KV Create on a
  shared bucket: first sight wins (accept), any later sight on any node
  rejects (replay). A backend error fails CLOSED (reject), so a KV outage
  never silently disables anti-replay. The bucket carries a TTL =
  nonceTTL (2*clockSkew) so a key expires exactly when its replay window
  closes; raw base64 nonces are mapped to KV-safe keys via sha256-hex.
- Server.UseReplicatedNonces(js, replicas) swaps the store on a node;
  every node in a cluster calls it. NewServer still defaults to the
  in-memory cache (master behavior unchanged).

Test (DoD error path — the issue's cross-node replay case):
- TestReplicatedNonceRejectsCrossNodeReplay: two membershipd nodes share
  one KV bucket; a request accepted (200) on node A, replayed with the
  same ts+nonce to node B, is rejected (401) — and replaying to A again
  is rejected too.
2026-06-07 15:21:45 +02:00
agent c6ad63059f feat(0003e/1): client failover over a list of seeds and control planes
The client (issue 0003e, part 1) accepts a LIST of NATS seeds and a LIST
of control-plane URLs so a node loss is transparent.

pkg/client:
- Options.NatsServers: extra NATS seeds beyond the primary. The client
  connects to the joined seed list with MaxReconnects(-1) +
  RetryOnFailedConnect, so nats.go fails over to a surviving node when the
  one a client is attached to dies and rejoins a node that comes back.
- Options.CtrlURLs: extra control-plane endpoints. doJSON/putBlob/getBlob
  now try each endpoint in order, falling over on a transport error to the
  next (an HTTP response from any node is authoritative — every node
  serves the same state under the KV store). newSignedRequest becomes
  newSignedRequestTo(base, ...); each failover attempt mints a fresh nonce
  (the signature covers method+path+ts+nonce+body, not the host), so a
  retried request is never seen as a replay.
- ConnectedServer()/IsConnected(): observability for which node the data
  plane is attached to, for ops and failover tests.
- New/Connect/NewWithOptions keep their signatures (a single URL = a
  one-element list), so worker/chat/mobile/playground are unchanged.

Test (DoD edge — the issue's "kill node A" case):
- TestClientFailoverAcrossNodes: A seeds two clustered nodes, subscribes,
  receives a cross-node message; the node A is attached to is KILLED; A
  reconnects to the survivor and still receives messages — session intact.
2026-06-07 15:18:18 +02:00
egutierrez 649dc9e244 Merge issue/0003d-objectstore: replicated blobs on NATS Object Store 2026-06-07 15:12:45 +02:00
agent d6e668b984 feat(0003d): replicated blob store on NATS Object Store
Branch-by-abstraction for the blob store (issue 0003d): media ciphertext
can live in a replicated JetStream Object Store instead of local disk, so
a blob uploaded to one node survives a node loss and is reachable from
any node.

pkg/blobstore:
- Store is now an interface (Put/Get/Has). The filesystem backend is
  renamed diskStore and stays the default: New(dir) returns it.
- objectStore (new) implements Store over a NATS Object Store bucket with
  a configurable replication factor (R1..R5), matching the KV store's
  R1->R3 rollout. Content-addressing (sha256-hex) is identical, so the
  wire contract is unchanged.

pkg/membership:
- Server.blobs and NewServer take the blobstore.Store interface instead
  of the concrete type; no behavior change with the disk default.

Tests (DoD: golden + edge + contract):
- TestObjectStoreRoundTrip: put/get/has + content-addressed dedup.
- TestObjectStoreMissing: unknown hash is absent and unreadable.
- TestObjectStoreAddressMatchesDisk: the Object Store and disk backends
  address identical bytes to the IDENTICAL hash (portable blob refs).

Like the KV store (0003b), wiring membershipd to select the Object Store
is deferred to the decentralized boot path (flag off); disk stays default.
2026-06-07 15:12:45 +02:00
egutierrez 94e7ced1ef Merge issue/0003c-migrate-kv: idempotent SQLite->KV migration + backup 2026-06-07 15:09:56 +02:00
agent 9013ea5e33 feat(0003c): membershipd migrate-to-kv (idempotent SQLite -> JetStream KV)
The one-time data move decentralization needs (issue 0003c): copy the
entire control-plane state from the local SQLite database into the
replicated JetStream KV buckets, with a backup taken first.

pkg/membership:
- Snapshot / SealedKeyRecord: a backend-agnostic dump of the whole
  control plane (rooms with their real epoch, members, every sealed-key
  row across epochs, users with status).
- (*sqliteStore).ExportSnapshot and (*jetstreamStore).ExportSnapshot read
  a full Snapshot from each backend; (*jetstreamStore).importSnapshot
  writes one with raw Puts (preserving epoch/status, not resetting to
  defaults) so the migration is faithful and idempotent (every write is
  an overwrite, so re-running converges).
- MigrateSQLiteToKV orchestrates export -> import; BackupSQLite makes a
  consistent copy via SQLite's VACUUM INTO before any migration.

cmd/membershipd:
- `membershipd migrate-to-kv --db <path> --nats-url <url> [--replicas N]
  [--ca <cert>] [--no-backup]` backs up the SQLite file, connects to the
  cluster's NATS, and migrates. Dispatched on the host like `user`.

Tests (DoD: golden + edge + parity):
- TestMigrateSQLiteToKVParity: seed a representative SQLite (two rooms,
  one rekeyed to epoch 2, members, a revoked user); after migration the
  KV ExportSnapshot equals the SQLite ExportSnapshot.
- TestMigrateSQLiteToKVIdempotent: running the migration twice yields the
  same KV state.
- TestBackupSQLiteCreatesConsistentCopy: the backup reopens with
  identical data.
Plus a binary smoke (seed user -> run server -> migrate-to-kv -> re-run):
backup written, 1 user migrated, second run identical.
2026-06-07 15:09:56 +02:00
egutierrez b8c9b2b652 Merge issue/0003b-jetstream-store: Store interface + JetStream KV backend (fail-closed) 2026-06-07 15:04:52 +02:00
agent 6b3ace1d39 feat(0003b): membership.Store interface + JetStream KV implementation
Branch-by-abstraction for the control-plane store (issue 0003b), so the
membership state can move off process-local SQLite onto replicated
JetStream KV without rewriting callers and without breaking master.

pkg/membership:
- Store is now an interface (rooms/members/keys + user allowlist +
  Close). The existing SQLite implementation is renamed sqliteStore and
  stays the default: Open(path) still returns it. openSQLite keeps the
  concrete type for internal callers (the 0003c migration).
- ErrNotFound is a storage-agnostic "no such record" sentinel; both
  backends return it (the SQLite store maps sql.ErrNoRows to it). The
  control plane now branches on ErrNotFound instead of sql.ErrNoRows, so
  server.go no longer imports database/sql.
- jetstreamStore (new) implements Store over five replicated KV buckets:
  rooms, members, rooms_by_member (reverse index for ListRoomsForEndpoint),
  room_keys, users. Replication factor is configurable (R1..R5) for the
  R1->R3 rollout. Every read is bounded by OpTimeout and IsAuthorized /
  HasAdmin FAIL CLOSED on any backend error (a KV quorum loss denies,
  never admits), per the audit's requirement for the decentralized store.

dev/feature_flags.json:
- Add the `decentralized` flag (OFF): sqliteStore default while off,
  jetstreamStore behind it. The membershipd boot wiring that selects the
  KV store is deliberately deferred to 0003e/0003f (the embedded-NATS
  authenticator<->store bootstrap is part of the session/deploy redesign);
  OFF keeps the single-node SQLite control plane unchanged.

Tests (DoD: golden + edges + error path):
- TestJetStreamStoreRoomsCRUD: encrypted room + owner + invited member
  round-trip through every room/member/key method, including latest-epoch
  resolution and rekey.
- TestJetStreamStoreUsers: add/get/authorize/list/revoke + admin gate,
  with case-insensitive key normalization and duplicate rejection.
- TestJetStreamStoreNotFound: ErrNotFound mapping for misses.
- TestJetStreamStoreIsAuthorizedFailClosed: NATS backend shut down ->
  IsAuthorized and HasAdmin both DENY within the bounded timeout.

The full existing suite stays green: sqliteStore is unchanged behavior.
2026-06-07 15:04:52 +02:00
egutierrez 3230b31ade Merge issue/0003a-cluster: NATS cluster routes (auth + mutual TLS) 2026-06-07 14:54:53 +02:00
agent c90f145a05 feat(0003a): NATS cluster routes with shared-secret auth + mutual route TLS
Add high-availability cluster support to the embedded NATS server
(issue 0003a, first phase of decentralization).

pkg/embeddednats:
- ServerConfig gains ServerName (unique per node, required by JetStream
  RAFT) and an optional *ClusterConfig (cluster name, route host/port,
  peer route URLs, shared-secret Username/Password, and a mutual-TLS
  *tls.Config). applyClusterOpts maps it onto server.Options.Cluster +
  Routes. Nil Cluster keeps the legacy standalone server.

pkg/busauth:
- RouteTLSConfig builds the route layer's mutual-TLS config: the node
  presents its CA-signed certificate AND verifies the peer's certificate
  against the bus CA (RequireAndVerifyClientCert), reusing the issue-0001
  CA. Routes authenticate NODES, never the client nkey authenticator.

cmd/membershipd:
- Cluster flags (--cluster-name/--server-name/--cluster-port/--routes/
  --cluster-user/--cluster-pass/--route-tls-cert/-key/-ca) wire a node
  into the cluster. validateClusterConfig refuses a public cluster
  without a route secret and complete mutual route TLS, and rejects
  partial route-TLS flags (all-or-nothing). splitRoutes parses the CSV.

Tests (DoD: golden + 2 edge + error path):
- TestClusterForwardsAcrossNodes: 2-node cluster forwards a client
  subject from one node to a subscriber on the other.
- TestClusterThreeNodesForward: 3-node (HA shape) cross-node forwarding.
- TestClusterMutualTLSForwards: forwarding over mutual-TLS routes.
- TestClusterRejectsBadRouteAuth: wrong cluster password -> no route.
- TestClusterRejectsUnsignedNode: cert not signed by the bus CA -> no route.
- TestClusterConfigPolicy / TestSplitRoutes: boot-guard + CSV parsing.

Master stays green: standalone (no --cluster-name) is unchanged.
2026-06-07 14:54:53 +02:00
108 changed files with 8578 additions and 4151 deletions
-12
View File
@@ -1,12 +0,0 @@
.gradle/
build/
local.properties
*.iml
.idea/
captures/
.cxx/
# The gomobile binding is a build artifact (~24 MB). Regenerate it from ../mobile
# with `gomobile bind` (see README.md); it is not versioned.
app/libs/*.aar
app/libs/*.jar
-83
View File
@@ -1,83 +0,0 @@
# unibus · app Android
Cliente móvil nativo de unibus. La app no habla con un gateway: embebe un **peer
real** del bus a través del binding gomobile `mobile/unibus.go`, de modo que el
cifrado extremo a extremo corre **en el dispositivo**. Cada teléfono es un peer
de primera clase del bus, igual que cualquier peer Go.
## Arquitectura
```
Kotlin/Compose UI ──> BusViewModel ──> com.unibus.core.mobile.Session (.aar)
│ (NATS data plane + E2E crypto, en Go)
membershipd (control plane HTTP :8470)
NATS (data plane :4250)
```
- `BusViewModel` traduce intents de UI en llamadas al binding. Las llamadas de red
(`newSession`, `createRoom`, `join`, `publish`) corren en `Dispatchers.IO`.
- Los frames entrantes llegan por `FrameListener.onFrame` en una goroutine NATS
(hilo JNI); se publican en un `StateFlow` (thread-safe) que Compose recolecta en
el hilo principal.
## Requisitos
- Android SDK (compileSdk 34), NDK (para regenerar el `.aar`), JDK 17.
- El binding `app/libs/unibus.aar` (no versionado: es un artefacto de ~24 MB).
## 1. Generar el binding (.aar)
Desde la raíz del repo de la app (`projects/message_bus/apps/unibus`):
```bash
export ANDROID_HOME=$HOME/android-sdk
export ANDROID_NDK_HOME=$HOME/android-sdk/ndk/26.3.11579264
mkdir -p android/app/libs
gomobile bind -target=android -androidapi 21 -javapkg com.unibus.core \
-o android/app/libs/unibus.aar ./mobile
```
Esto produce `unibus.aar` con la clase estática `com.unibus.core.mobile.Mobile`
(`generateIdentity`, `newSession`) y los tipos `Session` y `FrameListener`.
## 2. Compilar el APK
```bash
cd android
export JAVA_HOME=$HOME/android-sdk/jdk-17/jdk-17.0.19+10
export ANDROID_HOME=$HOME/android-sdk
./gradlew assembleDebug
# APK: app/build/outputs/apk/debug/app-debug.apk
```
`local.properties` apunta a `sdk.dir`; ajústalo si tu SDK está en otra ruta.
## 3. Arrancar el bus y probar en el emulador
```bash
# 1. En el PC: control plane + NATS embebido (HTTP :8470, NATS :4250)
cd projects/message_bus/apps/unibus && go run ./cmd/membershipd
# 2. Emulador Pixel_API34
$ANDROID_HOME/emulator/emulator -avd Pixel_API34 &
# 3. Instalar + lanzar
adb install -r app/build/outputs/apk/debug/app-debug.apk
adb shell am start -n com.unibus.app/.MainActivity
```
En la pantalla de conexión, desde el emulador el host del PC es `10.0.2.2`:
- **Host (control plane):** `http://10.0.2.2:8470`
- **NATS (data plane):** `nats://10.0.2.2:4250`
Para un teléfono físico en la misma LAN, usa la IP LAN del PC en lugar de
`10.0.2.2`.
## Notas
- La identidad del peer se guarda en `filesDir/peer.id` (claves privadas
Ed25519 + X25519). No se sincroniza ni se respalda.
- Una room creada en modo "cifrar (E2E)" usa la política Matrix (cifrada,
persistida, firmada); en modo normal usa NATS cleartext.
-66
View File
@@ -1,66 +0,0 @@
plugins {
id("com.android.application")
id("org.jetbrains.kotlin.android")
id("org.jetbrains.kotlin.plugin.compose")
}
android {
namespace = "com.unibus.app"
compileSdk = 34
defaultConfig {
applicationId = "com.unibus.app"
minSdk = 21
targetSdk = 34
versionCode = 1
versionName = "0.1.0"
}
buildFeatures {
compose = true
}
compileOptions {
sourceCompatibility = JavaVersion.VERSION_17
targetCompatibility = JavaVersion.VERSION_17
}
kotlinOptions {
jvmTarget = "17"
}
buildTypes {
getByName("release") {
isMinifyEnabled = false
proguardFiles(
getDefaultProguardFile("proguard-android-optimize.txt"),
"proguard-rules.pro",
)
}
}
packaging {
resources {
excludes += "/META-INF/{AL2.0,LGPL2.1}"
}
}
}
dependencies {
// The unibus gomobile binding: a real bus peer that does NATS + E2E crypto
// on the device. All protocol logic lives here, shared with every other peer.
implementation(files("libs/unibus.aar"))
val composeBom = platform("androidx.compose:compose-bom:2024.09.03")
implementation(composeBom)
implementation("androidx.compose.ui:ui")
implementation("androidx.compose.ui:ui-graphics")
implementation("androidx.compose.ui:ui-tooling-preview")
implementation("androidx.compose.material3:material3")
implementation("androidx.compose.material:material-icons-extended")
implementation("androidx.activity:activity-compose:1.9.2")
implementation("androidx.lifecycle:lifecycle-viewmodel-compose:2.8.6")
implementation("androidx.lifecycle:lifecycle-runtime-ktx:2.8.6")
implementation("org.jetbrains.kotlinx:kotlinx-coroutines-android:1.8.1")
debugImplementation("androidx.compose.ui:ui-tooling")
}
-4
View File
@@ -1,4 +0,0 @@
# gomobile generates JNI-bound classes under com.unibus.core.mobile and go.*.
# They are reached from native code, so keep them intact even when minifying.
-keep class com.unibus.core.mobile.** { *; }
-keep class go.** { *; }
-25
View File
@@ -1,25 +0,0 @@
<?xml version="1.0" encoding="utf-8"?>
<manifest xmlns:android="http://schemas.android.com/apk/res/android">
<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
<application
android:allowBackup="true"
android:label="@string/app_name"
android:supportsRtl="true"
android:usesCleartextTraffic="true"
android:theme="@style/Theme.Unibus">
<activity
android:name=".MainActivity"
android:exported="true"
android:label="@string/app_name"
android:windowSoftInputMode="adjustResize">
<intent-filter>
<action android:name="android.intent.action.MAIN" />
<category android:name="android.intent.category.LAUNCHER" />
</intent-filter>
</activity>
</application>
</manifest>
@@ -1,162 +0,0 @@
package com.unibus.app
import android.app.Application
import androidx.lifecycle.AndroidViewModel
import androidx.lifecycle.viewModelScope
import com.unibus.core.mobile.FrameListener
import com.unibus.core.mobile.Mobile
import com.unibus.core.mobile.Session
import kotlinx.coroutines.Dispatchers
import kotlinx.coroutines.flow.MutableStateFlow
import kotlinx.coroutines.flow.StateFlow
import kotlinx.coroutines.flow.asStateFlow
import kotlinx.coroutines.flow.update
import kotlinx.coroutines.launch
import java.io.File
/** One chat message shown in the UI. */
data class ChatMessage(
val sender: String,
val text: String,
val mine: Boolean,
val ts: Long,
)
/** The whole observable UI state of the app. */
data class BusState(
val connecting: Boolean = false,
val connected: Boolean = false,
val endpointId: String = "",
val roomId: String = "",
val roomSubject: String = "",
val status: String = "",
val error: String? = null,
val messages: List<ChatMessage> = emptyList(),
)
/**
* BusViewModel drives a real unibus peer on the device through the gomobile
* binding. The binding performs NATS transport and end-to-end crypto natively;
* this class only translates UI intents into binding calls and exposes the
* incoming frames as observable state.
*
* Threading: every binding call that touches the network (newSession, createRoom,
* join, publish) runs off the main thread on Dispatchers.IO to avoid
* NetworkOnMainThreadException. Incoming frames arrive on a JNI-attached NATS
* goroutine via [onFrame]; we only append to a thread-safe StateFlow there, and
* Compose collects that flow on the main thread.
*/
class BusViewModel(app: Application) : AndroidViewModel(app), FrameListener {
private val _state = MutableStateFlow(BusState())
val state: StateFlow<BusState> = _state.asStateFlow()
private var session: Session? = null
private var myEndpoint: String = ""
private val idPath: String
get() = File(getApplication<Application>().filesDir, "peer.id").absolutePath
override fun onFrame(roomID: String, sender: String, msgID: String, text: String) {
_state.update {
it.copy(
messages = it.messages + ChatMessage(
sender = sender,
text = text,
mine = sender == myEndpoint,
ts = System.currentTimeMillis(),
),
)
}
}
fun connect(host: String, nats: String, peerName: String) {
if (_state.value.connecting) return
_state.update { it.copy(connecting = true, error = null, status = "Conectando…") }
viewModelScope.launch(Dispatchers.IO) {
try {
val s = Mobile.newSession(idPath, nats.trim(), host.trim())
session = s
myEndpoint = s.endpointID()
_state.update {
it.copy(
connecting = false,
connected = true,
endpointId = myEndpoint,
status = "Conectado como $peerName",
)
}
} catch (e: Exception) {
_state.update {
it.copy(connecting = false, connected = false, error = e.message ?: "error desconocido")
}
}
}
}
fun createRoom(subject: String, encrypted: Boolean) {
val s = session ?: return
viewModelScope.launch(Dispatchers.IO) {
try {
val mode = if (encrypted) "matrix" else "nats"
val roomId = s.createRoom(subject.trim(), mode)
s.subscribe(roomId, this@BusViewModel)
_state.update {
it.copy(
roomId = roomId,
roomSubject = subject.trim(),
messages = emptyList(),
status = "Room creada",
)
}
} catch (e: Exception) {
_state.update { it.copy(error = e.message ?: "error al crear room") }
}
}
}
fun joinRoom(roomId: String) {
val s = session ?: return
viewModelScope.launch(Dispatchers.IO) {
try {
val rid = roomId.trim()
s.join(rid)
s.subscribe(rid, this@BusViewModel)
_state.update {
it.copy(roomId = rid, roomSubject = "(unida)", messages = emptyList(), status = "Unido a la room")
}
} catch (e: Exception) {
_state.update { it.copy(error = e.message ?: "error al unirse") }
}
}
}
fun publish(text: String) {
val s = session ?: return
val room = _state.value.roomId
if (room.isEmpty() || text.isBlank()) return
viewModelScope.launch(Dispatchers.IO) {
try {
s.publish(room, text)
} catch (e: Exception) {
_state.update { it.copy(error = e.message ?: "error al publicar") }
}
}
}
/** card returns this peer's shareable public identity (no secret). */
fun card(): String = try {
session?.card() ?: ""
} catch (_: Exception) {
""
}
fun clearError() = _state.update { it.copy(error = null) }
override fun onCleared() {
try {
session?.close()
} catch (_: Exception) {
}
session = null
}
}
@@ -1,307 +0,0 @@
package com.unibus.app
import android.os.Bundle
import androidx.activity.ComponentActivity
import androidx.activity.compose.setContent
import androidx.activity.viewModels
import androidx.compose.foundation.layout.Arrangement
import androidx.compose.foundation.layout.Box
import androidx.compose.foundation.layout.Column
import androidx.compose.foundation.layout.Row
import androidx.compose.foundation.layout.Spacer
import androidx.compose.foundation.layout.fillMaxSize
import androidx.compose.foundation.layout.fillMaxWidth
import androidx.compose.foundation.layout.height
import androidx.compose.foundation.layout.padding
import androidx.compose.foundation.layout.width
import androidx.compose.foundation.lazy.LazyColumn
import androidx.compose.foundation.lazy.itemsIndexed
import androidx.compose.foundation.lazy.rememberLazyListState
import androidx.compose.material.icons.Icons
import androidx.compose.material.icons.automirrored.filled.Send
import androidx.compose.material.icons.filled.Add
import androidx.compose.material.icons.filled.Lock
import androidx.compose.material3.Button
import androidx.compose.material3.Card
import androidx.compose.material3.CircularProgressIndicator
import androidx.compose.material3.ExperimentalMaterial3Api
import androidx.compose.material3.Icon
import androidx.compose.material3.IconButton
import androidx.compose.material3.MaterialTheme
import androidx.compose.material3.OutlinedButton
import androidx.compose.material3.OutlinedTextField
import androidx.compose.material3.Scaffold
import androidx.compose.material3.Surface
import androidx.compose.material3.Switch
import androidx.compose.material3.Text
import androidx.compose.material3.TopAppBar
import androidx.compose.material3.darkColorScheme
import androidx.compose.runtime.Composable
import androidx.compose.runtime.LaunchedEffect
import androidx.compose.runtime.collectAsState
import androidx.compose.runtime.getValue
import androidx.compose.runtime.mutableStateOf
import androidx.compose.runtime.remember
import androidx.compose.runtime.saveable.rememberSaveable
import androidx.compose.runtime.setValue
import androidx.compose.ui.Alignment
import androidx.compose.ui.Modifier
import androidx.compose.ui.text.style.TextOverflow
import androidx.compose.ui.unit.dp
import java.text.SimpleDateFormat
import java.util.Date
import java.util.Locale
class MainActivity : ComponentActivity() {
private val vm: BusViewModel by viewModels()
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
setContent {
MaterialTheme(colorScheme = darkColorScheme()) {
Surface(modifier = Modifier.fillMaxSize()) {
UnibusApp(vm)
}
}
}
}
}
@Composable
fun UnibusApp(vm: BusViewModel) {
val state by vm.state.collectAsState()
if (!state.connected) {
ConnectScreen(
connecting = state.connecting,
error = state.error,
onConnect = { host, nats, name -> vm.connect(host, nats, name) },
)
} else {
ChatScreen(state = state, vm = vm)
}
}
@Composable
fun ConnectScreen(
connecting: Boolean,
error: String?,
onConnect: (String, String, String) -> Unit,
) {
var host by rememberSaveable { mutableStateOf("http://10.0.2.2:8470") }
var nats by rememberSaveable { mutableStateOf("nats://10.0.2.2:4250") }
var name by rememberSaveable { mutableStateOf("android") }
Column(
modifier = Modifier
.fillMaxSize()
.padding(24.dp),
verticalArrangement = Arrangement.Center,
) {
Text("unibus", style = MaterialTheme.typography.headlineMedium)
Text(
"chat cifrado extremo a extremo sobre NATS",
style = MaterialTheme.typography.bodyMedium,
color = MaterialTheme.colorScheme.onSurfaceVariant,
)
Spacer(Modifier.height(24.dp))
OutlinedTextField(
value = host,
onValueChange = { host = it },
label = { Text("Host (control plane)") },
singleLine = true,
modifier = Modifier.fillMaxWidth(),
)
Spacer(Modifier.height(12.dp))
OutlinedTextField(
value = nats,
onValueChange = { nats = it },
label = { Text("NATS (data plane)") },
singleLine = true,
modifier = Modifier.fillMaxWidth(),
)
Spacer(Modifier.height(12.dp))
OutlinedTextField(
value = name,
onValueChange = { name = it },
label = { Text("Identidad") },
singleLine = true,
modifier = Modifier.fillMaxWidth(),
)
if (error != null) {
Spacer(Modifier.height(12.dp))
Text(error, color = MaterialTheme.colorScheme.error)
}
Spacer(Modifier.height(24.dp))
Button(
onClick = { onConnect(host, nats, name) },
enabled = !connecting,
modifier = Modifier.fillMaxWidth(),
) {
if (connecting) {
CircularProgressIndicator(modifier = Modifier.height(18.dp).width(18.dp), strokeWidth = 2.dp)
Spacer(Modifier.width(8.dp))
}
Text(if (connecting) "Conectando…" else "Conectar")
}
}
}
@OptIn(ExperimentalMaterial3Api::class)
@Composable
fun ChatScreen(state: BusState, vm: BusViewModel) {
var subject by rememberSaveable { mutableStateOf("room.general") }
var encrypt by rememberSaveable { mutableStateOf(false) }
var joinId by rememberSaveable { mutableStateOf("") }
var draft by rememberSaveable { mutableStateOf("") }
val listState = rememberLazyListState()
LaunchedEffect(state.messages.size) {
if (state.messages.isNotEmpty()) listState.animateScrollToItem(state.messages.size - 1)
}
Scaffold(
topBar = {
TopAppBar(
title = {
Column {
Text("unibus", style = MaterialTheme.typography.titleMedium)
Text(
state.status.ifEmpty { state.endpointId.take(12) + "" },
style = MaterialTheme.typography.bodySmall,
maxLines = 1,
overflow = TextOverflow.Ellipsis,
)
}
},
)
},
) { inner ->
Column(
modifier = Modifier
.fillMaxSize()
.padding(inner)
.padding(horizontal = 12.dp),
) {
// Room controls.
Card(modifier = Modifier.fillMaxWidth().padding(vertical = 8.dp)) {
Column(Modifier.padding(12.dp)) {
Row(verticalAlignment = Alignment.CenterVertically) {
OutlinedTextField(
value = subject,
onValueChange = { subject = it },
label = { Text("subject") },
singleLine = true,
modifier = Modifier.weight(1f),
)
Spacer(Modifier.width(8.dp))
Button(onClick = { vm.createRoom(subject, encrypt) }) {
Icon(Icons.Filled.Add, contentDescription = "crear")
}
}
Row(verticalAlignment = Alignment.CenterVertically) {
Switch(checked = encrypt, onCheckedChange = { encrypt = it })
Spacer(Modifier.width(8.dp))
Icon(Icons.Filled.Lock, contentDescription = null, modifier = Modifier.height(16.dp))
Text("cifrar (E2E)", style = MaterialTheme.typography.bodySmall)
}
Spacer(Modifier.height(4.dp))
Row(verticalAlignment = Alignment.CenterVertically) {
OutlinedTextField(
value = joinId,
onValueChange = { joinId = it },
label = { Text("unirse por room id") },
singleLine = true,
modifier = Modifier.weight(1f),
)
Spacer(Modifier.width(8.dp))
OutlinedButton(onClick = { if (joinId.isNotBlank()) vm.joinRoom(joinId) }) {
Text("Unir")
}
}
if (state.roomId.isNotEmpty()) {
Spacer(Modifier.height(4.dp))
Text(
"room: ${state.roomSubject} · ${state.roomId}",
style = MaterialTheme.typography.bodySmall,
color = MaterialTheme.colorScheme.onSurfaceVariant,
maxLines = 1,
overflow = TextOverflow.Ellipsis,
)
}
}
}
if (state.error != null) {
Text(
state.error,
color = MaterialTheme.colorScheme.error,
modifier = Modifier.fillMaxWidth().padding(vertical = 4.dp),
)
}
// Messages.
LazyColumn(
state = listState,
modifier = Modifier.weight(1f).fillMaxWidth(),
verticalArrangement = Arrangement.spacedBy(6.dp),
) {
itemsIndexed(state.messages, key = { i, m -> "${m.ts}-$i" }) { _, m ->
MessageBubble(m)
}
}
// Composer.
Row(
modifier = Modifier.fillMaxWidth().padding(vertical = 8.dp),
verticalAlignment = Alignment.CenterVertically,
) {
OutlinedTextField(
value = draft,
onValueChange = { draft = it },
placeholder = { Text("Mensaje…") },
singleLine = true,
enabled = state.roomId.isNotEmpty(),
modifier = Modifier.weight(1f),
)
Spacer(Modifier.width(8.dp))
IconButton(
onClick = {
vm.publish(draft)
draft = ""
},
enabled = state.roomId.isNotEmpty() && draft.isNotBlank(),
) {
Icon(Icons.AutoMirrored.Filled.Send, contentDescription = "enviar")
}
}
}
}
}
private val timeFmt = SimpleDateFormat("HH:mm:ss", Locale.getDefault())
@Composable
fun MessageBubble(m: ChatMessage) {
val align = if (m.mine) Alignment.End else Alignment.Start
Column(modifier = Modifier.fillMaxWidth(), horizontalAlignment = align) {
Card(
modifier = Modifier.fillMaxWidth(0.8f),
) {
Column(Modifier.padding(8.dp)) {
if (!m.mine) {
Text(
m.sender.take(12) + "",
style = MaterialTheme.typography.labelSmall,
color = MaterialTheme.colorScheme.primary,
)
}
Text(m.text, style = MaterialTheme.typography.bodyMedium)
Text(
timeFmt.format(Date(m.ts)),
style = MaterialTheme.typography.labelSmall,
color = MaterialTheme.colorScheme.onSurfaceVariant,
)
}
}
}
}
@@ -1,4 +0,0 @@
<?xml version="1.0" encoding="utf-8"?>
<resources>
<string name="app_name">unibus</string>
</resources>
@@ -1,6 +0,0 @@
<?xml version="1.0" encoding="utf-8"?>
<resources>
<!-- A minimal Material3 base theme; the real UI styling is driven by Compose
Material3 (MaterialTheme) at runtime. -->
<style name="Theme.Unibus" parent="android:Theme.Material.NoActionBar" />
</resources>
-8
View File
@@ -1,8 +0,0 @@
// Top-level build file. Plugin versions are declared here and applied in the
// module build scripts. AGP 8.5 + Kotlin 2.0 (with the dedicated Compose
// compiler plugin) target the locally installed SDK (compileSdk 34).
plugins {
id("com.android.application") version "8.5.2" apply false
id("org.jetbrains.kotlin.android") version "2.0.21" apply false
id("org.jetbrains.kotlin.plugin.compose") version "2.0.21" apply false
}
-5
View File
@@ -1,5 +0,0 @@
org.gradle.jvmargs=-Xmx2048m -Dfile.encoding=UTF-8
org.gradle.caching=true
android.useAndroidX=true
android.nonTransitiveRClass=true
kotlin.code.style=official
Binary file not shown.
-7
View File
@@ -1,7 +0,0 @@
distributionBase=GRADLE_USER_HOME
distributionPath=wrapper/dists
distributionUrl=https\://services.gradle.org/distributions/gradle-8.9-bin.zip
networkTimeout=10000
validateDistributionUrl=true
zipStoreBase=GRADLE_USER_HOME
zipStorePath=wrapper/dists
-252
View File
@@ -1,252 +0,0 @@
#!/bin/sh
#
# Copyright © 2015-2021 the original authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# SPDX-License-Identifier: Apache-2.0
#
##############################################################################
#
# Gradle start up script for POSIX generated by Gradle.
#
# Important for running:
#
# (1) You need a POSIX-compliant shell to run this script. If your /bin/sh is
# noncompliant, but you have some other compliant shell such as ksh or
# bash, then to run this script, type that shell name before the whole
# command line, like:
#
# ksh Gradle
#
# Busybox and similar reduced shells will NOT work, because this script
# requires all of these POSIX shell features:
# * functions;
# * expansions «$var», «${var}», «${var:-default}», «${var+SET}»,
# «${var#prefix}», «${var%suffix}», and «$( cmd )»;
# * compound commands having a testable exit status, especially «case»;
# * various built-in commands including «command», «set», and «ulimit».
#
# Important for patching:
#
# (2) This script targets any POSIX shell, so it avoids extensions provided
# by Bash, Ksh, etc; in particular arrays are avoided.
#
# The "traditional" practice of packing multiple parameters into a
# space-separated string is a well documented source of bugs and security
# problems, so this is (mostly) avoided, by progressively accumulating
# options in "$@", and eventually passing that to Java.
#
# Where the inherited environment variables (DEFAULT_JVM_OPTS, JAVA_OPTS,
# and GRADLE_OPTS) rely on word-splitting, this is performed explicitly;
# see the in-line comments for details.
#
# There are tweaks for specific operating systems such as AIX, CygWin,
# Darwin, MinGW, and NonStop.
#
# (3) This script is generated from the Groovy template
# https://github.com/gradle/gradle/blob/HEAD/platforms/jvm/plugins-application/src/main/resources/org/gradle/api/internal/plugins/unixStartScript.txt
# within the Gradle project.
#
# You can find Gradle at https://github.com/gradle/gradle/.
#
##############################################################################
# Attempt to set APP_HOME
# Resolve links: $0 may be a link
app_path=$0
# Need this for daisy-chained symlinks.
while
APP_HOME=${app_path%"${app_path##*/}"} # leaves a trailing /; empty if no leading path
[ -h "$app_path" ]
do
ls=$( ls -ld "$app_path" )
link=${ls#*' -> '}
case $link in #(
/*) app_path=$link ;; #(
*) app_path=$APP_HOME$link ;;
esac
done
# This is normally unused
# shellcheck disable=SC2034
APP_BASE_NAME=${0##*/}
# Discard cd standard output in case $CDPATH is set (https://github.com/gradle/gradle/issues/25036)
APP_HOME=$( cd -P "${APP_HOME:-./}" > /dev/null && printf '%s
' "$PWD" ) || exit
# Use the maximum available, or set MAX_FD != -1 to use that value.
MAX_FD=maximum
warn () {
echo "$*"
} >&2
die () {
echo
echo "$*"
echo
exit 1
} >&2
# OS specific support (must be 'true' or 'false').
cygwin=false
msys=false
darwin=false
nonstop=false
case "$( uname )" in #(
CYGWIN* ) cygwin=true ;; #(
Darwin* ) darwin=true ;; #(
MSYS* | MINGW* ) msys=true ;; #(
NONSTOP* ) nonstop=true ;;
esac
CLASSPATH=$APP_HOME/gradle/wrapper/gradle-wrapper.jar
# Determine the Java command to use to start the JVM.
if [ -n "$JAVA_HOME" ] ; then
if [ -x "$JAVA_HOME/jre/sh/java" ] ; then
# IBM's JDK on AIX uses strange locations for the executables
JAVACMD=$JAVA_HOME/jre/sh/java
else
JAVACMD=$JAVA_HOME/bin/java
fi
if [ ! -x "$JAVACMD" ] ; then
die "ERROR: JAVA_HOME is set to an invalid directory: $JAVA_HOME
Please set the JAVA_HOME variable in your environment to match the
location of your Java installation."
fi
else
JAVACMD=java
if ! command -v java >/dev/null 2>&1
then
die "ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH.
Please set the JAVA_HOME variable in your environment to match the
location of your Java installation."
fi
fi
# Increase the maximum file descriptors if we can.
if ! "$cygwin" && ! "$darwin" && ! "$nonstop" ; then
case $MAX_FD in #(
max*)
# In POSIX sh, ulimit -H is undefined. That's why the result is checked to see if it worked.
# shellcheck disable=SC2039,SC3045
MAX_FD=$( ulimit -H -n ) ||
warn "Could not query maximum file descriptor limit"
esac
case $MAX_FD in #(
'' | soft) :;; #(
*)
# In POSIX sh, ulimit -n is undefined. That's why the result is checked to see if it worked.
# shellcheck disable=SC2039,SC3045
ulimit -n "$MAX_FD" ||
warn "Could not set maximum file descriptor limit to $MAX_FD"
esac
fi
# Collect all arguments for the java command, stacking in reverse order:
# * args from the command line
# * the main class name
# * -classpath
# * -D...appname settings
# * --module-path (only if needed)
# * DEFAULT_JVM_OPTS, JAVA_OPTS, and GRADLE_OPTS environment variables.
# For Cygwin or MSYS, switch paths to Windows format before running java
if "$cygwin" || "$msys" ; then
APP_HOME=$( cygpath --path --mixed "$APP_HOME" )
CLASSPATH=$( cygpath --path --mixed "$CLASSPATH" )
JAVACMD=$( cygpath --unix "$JAVACMD" )
# Now convert the arguments - kludge to limit ourselves to /bin/sh
for arg do
if
case $arg in #(
-*) false ;; # don't mess with options #(
/?*) t=${arg#/} t=/${t%%/*} # looks like a POSIX filepath
[ -e "$t" ] ;; #(
*) false ;;
esac
then
arg=$( cygpath --path --ignore --mixed "$arg" )
fi
# Roll the args list around exactly as many times as the number of
# args, so each arg winds up back in the position where it started, but
# possibly modified.
#
# NB: a `for` loop captures its iteration list before it begins, so
# changing the positional parameters here affects neither the number of
# iterations, nor the values presented in `arg`.
shift # remove old arg
set -- "$@" "$arg" # push replacement arg
done
fi
# Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script.
DEFAULT_JVM_OPTS='"-Xmx64m" "-Xms64m"'
# Collect all arguments for the java command:
# * DEFAULT_JVM_OPTS, JAVA_OPTS, JAVA_OPTS, and optsEnvironmentVar are not allowed to contain shell fragments,
# and any embedded shellness will be escaped.
# * For example: A user cannot expect ${Hostname} to be expanded, as it is an environment variable and will be
# treated as '${Hostname}' itself on the command line.
set -- \
"-Dorg.gradle.appname=$APP_BASE_NAME" \
-classpath "$CLASSPATH" \
org.gradle.wrapper.GradleWrapperMain \
"$@"
# Stop when "xargs" is not available.
if ! command -v xargs >/dev/null 2>&1
then
die "xargs is not available"
fi
# Use "xargs" to parse quoted args.
#
# With -n1 it outputs one arg per line, with the quotes and backslashes removed.
#
# In Bash we could simply go:
#
# readarray ARGS < <( xargs -n1 <<<"$var" ) &&
# set -- "${ARGS[@]}" "$@"
#
# but POSIX shell has neither arrays nor command substitution, so instead we
# post-process each arg (as a line of input to sed) to backslash-escape any
# character that might be a shell metacharacter, then use eval to reverse
# that process (while maintaining the separation between arguments), and wrap
# the whole thing up as a single "set" statement.
#
# This will of course break if any of these variables contains a newline or
# an unmatched quote.
#
eval "set -- $(
printf '%s\n' "$DEFAULT_JVM_OPTS $JAVA_OPTS $GRADLE_OPTS" |
xargs -n1 |
sed ' s~[^-[:alnum:]+,./:=@_]~\\&~g; ' |
tr '\n' ' '
)" '"$@"'
exec "$JAVACMD" "$@"
-94
View File
@@ -1,94 +0,0 @@
@rem
@rem Copyright 2015 the original author or authors.
@rem
@rem Licensed under the Apache License, Version 2.0 (the "License");
@rem you may not use this file except in compliance with the License.
@rem You may obtain a copy of the License at
@rem
@rem https://www.apache.org/licenses/LICENSE-2.0
@rem
@rem Unless required by applicable law or agreed to in writing, software
@rem distributed under the License is distributed on an "AS IS" BASIS,
@rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@rem See the License for the specific language governing permissions and
@rem limitations under the License.
@rem
@rem SPDX-License-Identifier: Apache-2.0
@rem
@if "%DEBUG%"=="" @echo off
@rem ##########################################################################
@rem
@rem Gradle startup script for Windows
@rem
@rem ##########################################################################
@rem Set local scope for the variables with windows NT shell
if "%OS%"=="Windows_NT" setlocal
set DIRNAME=%~dp0
if "%DIRNAME%"=="" set DIRNAME=.
@rem This is normally unused
set APP_BASE_NAME=%~n0
set APP_HOME=%DIRNAME%
@rem Resolve any "." and ".." in APP_HOME to make it shorter.
for %%i in ("%APP_HOME%") do set APP_HOME=%%~fi
@rem Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script.
set DEFAULT_JVM_OPTS="-Xmx64m" "-Xms64m"
@rem Find java.exe
if defined JAVA_HOME goto findJavaFromJavaHome
set JAVA_EXE=java.exe
%JAVA_EXE% -version >NUL 2>&1
if %ERRORLEVEL% equ 0 goto execute
echo. 1>&2
echo ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH. 1>&2
echo. 1>&2
echo Please set the JAVA_HOME variable in your environment to match the 1>&2
echo location of your Java installation. 1>&2
goto fail
:findJavaFromJavaHome
set JAVA_HOME=%JAVA_HOME:"=%
set JAVA_EXE=%JAVA_HOME%/bin/java.exe
if exist "%JAVA_EXE%" goto execute
echo. 1>&2
echo ERROR: JAVA_HOME is set to an invalid directory: %JAVA_HOME% 1>&2
echo. 1>&2
echo Please set the JAVA_HOME variable in your environment to match the 1>&2
echo location of your Java installation. 1>&2
goto fail
:execute
@rem Setup the command line
set CLASSPATH=%APP_HOME%\gradle\wrapper\gradle-wrapper.jar
@rem Execute Gradle
"%JAVA_EXE%" %DEFAULT_JVM_OPTS% %JAVA_OPTS% %GRADLE_OPTS% "-Dorg.gradle.appname=%APP_BASE_NAME%" -classpath "%CLASSPATH%" org.gradle.wrapper.GradleWrapperMain %*
:end
@rem End local scope for the variables with windows NT shell
if %ERRORLEVEL% equ 0 goto mainEnd
:fail
rem Set variable GRADLE_EXIT_CONSOLE if you need the _script_ return code instead of
rem the _cmd.exe /c_ return code!
set EXIT_CODE=%ERRORLEVEL%
if %EXIT_CODE% equ 0 set EXIT_CODE=1
if not ""=="%GRADLE_EXIT_CONSOLE%" exit %EXIT_CODE%
exit /b %EXIT_CODE%
:mainEnd
if "%OS%"=="Windows_NT" endlocal
:omega
-23
View File
@@ -1,23 +0,0 @@
pluginManagement {
repositories {
google {
content {
includeGroupByRegex("com\\.android.*")
includeGroupByRegex("com\\.google.*")
includeGroupByRegex("androidx.*")
}
}
mavenCentral()
gradlePluginPortal()
}
}
dependencyResolutionManagement {
repositoriesMode.set(RepositoriesMode.FAIL_ON_PROJECT_REPOS)
repositories {
google()
mavenCentral()
}
}
rootProject.name = "unibus"
include(":app")
+127 -1
View File
@@ -2,7 +2,7 @@
name: unibus name: unibus
lang: go lang: go
domain: infra domain: infra
version: 0.5.0 version: 0.10.0
description: "Bus de mensajería unificado sobre NATS+JetStream con cifrado E2E por room (megolm/olm reducido): service de membresía/claves, librería cliente y peers demo." description: "Bus de mensajería unificado sobre NATS+JetStream con cifrado E2E por room (megolm/olm reducido): service de membresía/claves, librería cliente y peers demo."
tags: [service, messaging, nats, e2e] tags: [service, messaging, nats, e2e]
uses_functions: uses_functions:
@@ -122,6 +122,21 @@ Para apuntar a un NATS externo en producción: `--nats-url nats://host:4222` en
las rutas GET de lectura. Confía en la red interna. Las rutas mutantes las rutas GET de lectura. Confía en la red interna. Las rutas mutantes
(`/rooms`, `/invite`, `/rekey`) sí exigen firma Ed25519 del owner sobre los (`/rooms`, `/invite`, `/rekey`) sí exigen firma Ed25519 del owner sobre los
bytes canónicos de la request. Endurecer es fase posterior. bytes canónicos de la request. Endurecer es fase posterior.
- **Gestión de usuarios: storage unificado, alta por dos vías.** El allowlist de
usuarios vive en el MISMO store que las rooms (`pkg/membership.Store`): SQLite en
single-node, JetStream KV replicado (`UNIBUS_users`) en cluster. El `Server` ya
tiene ese store privilegiado abierto (es quien sirve el KV en cada nodo), así que
expone `GET/POST /users` y `POST /users/{signpub}/revoke` como API HTTP admin-only,
simétrica con las rutas de rooms: el panel de administración firma como admin y el
server ejecuta la mutación contra el mismo store. El panel NO necesita `--db`, ni la
identidad interna, ni correr en un nodo del cluster; funciona idéntico en single-node
y cluster. La autorización es default-deny: solo un firmante que el store confirma como
`role == "admin"` activo pasa, cualquier otro recibe 403 (encima de la firma+nonce+TLS
ya existentes). La CLI `membershipd user add --store kv` sigue existiendo SOLO para
sembrar el admin #0 (bootstrap del huevo-gallina: sin un admin sembrado no hay quién
firme el primer `POST /users`); a partir de ahí toda la gestión es HTTP admin-only. El
alta es idempotente igual que la CLI: re-alta de una clave ya registrada = 409, sin
sobrescribir ni elevar rol; el revoke es un flip de status (sin hard-delete), auditable.
- **Identidad = secreto crítico.** El archivo de identidad (`worker.id`, - **Identidad = secreto crítico.** El archivo de identidad (`worker.id`,
`chat.id`) contiene las claves privadas (Ed25519 + X25519). Se escribe 0600. `chat.id`) contiene las claves privadas (Ed25519 + X25519). Se escribe 0600.
Perderlo = mensajes ilegibles, sin recuperación. Trátalo como una clave SSH. Perderlo = mensajes ilegibles, sin recuperación. Trátalo como una clave SSH.
@@ -154,6 +169,117 @@ agent.<nombre>.{in,out} inbox/outbox de agente LLM (agent.scout.in)
## Capability growth log ## Capability growth log
- v0.10.0 (2026-06-07) — API HTTP admin-only de gestión de usuarios, cerrando la
última asimetría del control plane: las rooms tenían superficie HTTP firmada
(`POST /rooms`, etc.) pero los users solo se gestionaban por CLI local o acceso
directo al store. Se añaden `GET /users` (lista completa, incluidos revocados),
`POST /users` (alta `{sign_pub, handle, role}`: valida hex de 64 chars + role en
`{admin, member}`, 409 idempotente que no sobrescribe ni eleva rol) y
`POST /users/{signpub}/revoke` (flip de status, sin hard-delete). Los tres pasan por
un helper `requireAdmin` default-deny que confirma contra el store que el firmante
autenticado es un user `role == "admin"` activo (el endpoint id es un hash one-way de
la clave, así que el contexto lleva ahora también el `sign_pub` hex del firmante para
resolver `GetUser`); cualquier otro firmante recibe 403, encima de la firma+nonce+TLS+
enforce ya heredadas del middleware. NO se abre conexión KV nueva ni se usa la identidad
interna: el server escribe vía su `s.store` privilegiado, el MISMO que las rooms (SQLite
single-node, KV `UNIBUS_users` en cluster). `pkg/client` gana `ListUsers/AddUser/RevokeUser`
(tipo plano `UserInfo`) firmando como admin, así la pestaña Users del panel deja de
necesitar `--db`/acceso KV directo. La CLI `membershipd user add --store kv` queda SOLO
para sembrar el admin #0 (bootstrap). La validación de `sign_pub` se unifica en
`membership.ValidateSignPubHex`, reusada por la CLI y los handlers. Tests nuevos:
no-admin → 403 en los tres endpoints, roundtrip admin add→list→revoke, y validación
(hex inválido → 400, role inválido → 400, re-alta → 409), más un test de cliente contra
un membershipd embebido. Cambios 100% aditivos: el comportamiento single-node y de las
rutas de rooms no cambia; vet/build/test verdes.
- v0.9.0 (2026-06-07) — cierre de los gaps que el despliegue del cluster (report
0011) dejó abiertos (report 0012). (GAP A) Nueva capability `membershipd user
add|list|revoke --store kv`: alta/baja de usuarios contra el KV replicado del
cluster EN MARCHA, sin el procedimiento de parar-sembrar-rearrancar. Usa la
conexión interna privilegiada — el daemon persiste su identidad de servicio con
`--internal-id-file` (cada nodo genera/carga la suya, 0600 junto a las claves TLS)
y la CLI, ejecutada por loopback en un nodo, presenta esa nkey que el
autenticador reconoce con permisos plenos de JetStream; ninguna identidad de
usuario normal puede tocar los buckets `KV_UNIBUS_*` bajo la ACL por-subject. El
alta es idempotente (re-alta de la misma clave = `ErrUserExists` explícito, sin
sobrescribir ni elevar rol), commitea con quórum 2/3 (HA, imprime
`followers_current`) y rechaza un destino remoto sin `--ca` (igual que
`migrate-to-kv`). (GAP B) Nuevo `cmd/clientcheck`: verificación end-to-end real
con un cliente autenticado (identidad operator, nkey+TLS+https) que crea una room
E2E, publica y recibe descifrado contra el cluster vivo, incluido un nodo parado a
media transmisión donde el cliente hace failover a un superviviente y sigue
recibiendo con cero pérdida (quórum 2/3) — el plano de datos que el chaos test del
0011 nunca probó. (GAP C) Runbook `deploy/cluster/README.md` corregido: el orden
de arranque "magnus solo y verifica healthz" deadlockeaba (un nodo solo no tiene
quórum del meta-group y nunca sirve healthz); se documenta el arranque por quórum,
que R1 es un SPOF inservible (ir directo a R3) y la nueva vía de alta con el
cluster vivo. La plantilla de deploy (unit + `deploy-cluster.sh`) emite ya
`INTERNAL_ID_FILE` y el flag. Verificado contra los 3 VPS reales (magnus + homer +
datardos); posture enforce+ACL+TLS+R3 intacta.
- v0.8.0 (2026-06-07) — completar y endurecer el cluster (issue 0006, fases
0006a0006g) que cierra los bloqueantes de la auditoría dedicada del cluster
(report 0008) y cablea el control plane descentralizado que 0003 dejó a medias.
(0006a) Se cablea el nonce replicado en el binario: un nodo con `--cluster-name`
usa el bucket JetStream KV compartido obligatoriamente (fail-fast si no se crea),
cerrando el replay cross-node (N3); el "ciclo bootstrap" se resuelve con una
identidad interna efímera que el authenticator reconoce (full perms) y una
conexión in-process privilegiada. (0006b) Se cierra la fuga del control plane
por `$JS.API.>` (N2): la ACL pasa a un allow-set cerrado por-room (JS API solo de
los streams `UNIBUS_<room>` del peer), dejando `KV_UNIBUS_*`/`OBJ_*` fuera del
set y, por tanto, denegados. (0006c) Se cablea el store KV descentralizado
(`--store kv|sqlite`, default sqlite = baseline idéntico) con un `storeHolder`
fail-closed que rompe el ciclo bootstrap del authenticator. (0006d) Posture
homogénea: un nodo rechaza unirse al cluster sin `enforce`, y `/healthz` publica
la posture (N1). (0006e) Todos los clientes llaman `RefreshSession` tras cambios
de membresía (N4), de modo que la ACL es usable bajo enforce sin desactivarla.
(0006f) Bajos: secreto de cluster fuera de argv (`--cluster-pass-file`/env +
inyección en routes), `migrate-to-kv` rechaza target remoto sin `--ca`, y docs
de CA separada para routes + R1 SPOF vs R3 HA. (0006g) Material de deploy del
cluster de 3 nodos (magnus+homer+datardos) en `deploy/cluster/` (certs, unit,
script de despliegue dry-run, runbook) — sin tocar ningún VPS. Toda la
regresión de auditorías previas + los ataques 0008 siguen verdes; govulncheck 0
alcanzables. Branch-by-abstraction: con `--store sqlite` el single-node sigue
idéntico y desplegable en todo momento.
- v0.7.0 (2026-06-07) — hardening de seguridad 2 (issue 0005, fases 0005a0005e)
que cierra los hallazgos nuevos de la re-auditoría red-team (report 0006) y
lleva el veredicto de exposición pública a "sí-con-condiciones". (0005a) Bump de
`github.com/nats-io/nats-server/v2` v2.10.22→v2.11.15 y de la toolchain a
go1.26.4: `govulncheck ./...` pasa de 16 vulnerabilidades alcanzables (14 del
servidor NATS embebido + 2 de la stdlib) a 0. (0005b) `client.processFrame`
ahora descarta cualquier frame sin firma en una room `SignMsgs` (antes verificaba
solo si la firma venía presente, lo que permitía suplantar `Sender` con
`Sig==nil`). (0005c) Nuevo limiter global de bytes en vuelo
(`pkg/membership.inflightLimiter`) que acota la memoria agregada que el control
plane bufferiza bajo concurrencia (el límite por-request y el rate-limit por-IP
no acotaban el total): un flood concurrente multi-IP se descarta con 503 en vez
de crecer sin techo (el RSS deja de escalar con N). (0005d) El guard de arranque
`validateBootConfig` ahora exige `--tls-cert/--tls-key` en bind no-loopback (un
control plane público sin TLS servía metadata en claro). (0005e) Se cablea por
fin en `membershipd` la ACL por subject que ya existía huérfana desde 0003e
(`busauth.NewNkeyAuthenticatorACL` + nuevo adaptador `busauth.PermissionsFromSubjects`
sobre `membership.SubjectACLFor`): un registrado no-miembro ya no puede
`Subscribe(">")` y captar los subjects/advisories de rooms ajenas. Residuales
documentados: `$JS.API.>` sigue compartido (cierre completo = NATS accounts por
identidad, diferido) y los clientes deben `RefreshSession` tras cambios de
membresía (chat/worker aún no lo hacen). El comportamiento de un solo nodo no
cambia y master sigue verde.
- v0.6.0 (2026-06-07) — descentralización / alta disponibilidad (issue 0003,
fases 0003a0003e), report 0006. El servidor NATS embebido gana soporte de
cluster con routes autenticadas (secreto de cluster) y TLS mutuo de nodo
(`pkg/embeddednats.ClusterConfig` + `busauth.RouteTLSConfig`, reusando la CA
del 0001). El control plane (`pkg/membership.Store`) pasa a interfaz por
branch-by-abstraction: `sqliteStore` (default) + `jetstreamStore` nuevo sobre
JetStream KV replicado (réplicas configurables R1→R3), con `IsAuthorized`
fail-closed ante pérdida de quorum. `membershipd migrate-to-kv` mueve el
estado SQLite→KV de forma idempotente con backup previo. Los blobs
(`pkg/blobstore.Store`, ahora interfaz) ganan un backend NATS Object Store
replicado además del disco. El cliente acepta listas de seeds NATS y de
control planes con failover/reconnect nativo, el anti-replay pasa a un store
de nonces compartido en KV con TTL (cierra el agujero de replay multi-nodo), y
se implementa la ACL por subject derivada de pertenencia (audit H4 residual:
`busauth.NewNkeyAuthenticatorACL` + `membership.SubjectACLFor` +
`client.RefreshSession`). Todo viaja detrás del flag `decentralized` (off):
el comportamiento de un solo nodo (SQLite + disco) no cambia y master sigue
verde. El despliegue multi-nodo real (0003f) lo ejecuta el humano.
- v0.5.0 (2026-06-07) — hardening de seguridad (issue 0004) que cierra los - v0.5.0 (2026-06-07) — hardening de seguridad (issue 0004) que cierra los
hallazgos de la auditoría red-team (report 0004) y lleva el veredicto de hallazgos de la auditoría red-team (report 0004) y lleva el veredicto de
exposición pública de "NO" a "sí-con-condiciones". Anti-DoS pre-auth exposición pública de "NO" a "sí-con-condiciones". Anti-DoS pre-auth
+15
View File
@@ -69,6 +69,12 @@ func runSimple(natsURL, ctrlURL, roomSub, idFile, caFile string) {
if err := c.Join(roomID); err != nil { if err := c.Join(roomID); err != nil {
log.Fatalf("join: %v", err) log.Fatalf("join: %v", err)
} }
// Membership-change contract (issue 0006e): refresh so the just-created room's
// subject is subscribable under enforce+ACL (permissions are frozen at connect
// time). Must run BEFORE Subscribe — RefreshSession drops active subscriptions.
if err := c.RefreshSession(); err != nil {
log.Fatalf("refresh session after create room: %v", err)
}
sub, err := c.Subscribe(roomID, func(f frame.Frame, plaintext []byte) { sub, err := c.Subscribe(roomID, func(f frame.Frame, plaintext []byte) {
fmt.Printf("[%s] %s: %s\n", f.Subject, shortID(f.Sender), string(plaintext)) fmt.Printf("[%s] %s: %s\n", f.Subject, shortID(f.Sender), string(plaintext))
}) })
@@ -122,12 +128,21 @@ func runEncryptedDemo(natsURL, ctrlURL, caFile string) {
must(err, "A create room") must(err, "A create room")
fmt.Printf(" room.test -> %s (E2E, persisted, signed)\n", roomID) fmt.Printf(" room.test -> %s (E2E, persisted, signed)\n", roomID)
// Membership-change contract (issue 0006e): A only became a member of this room
// after connecting, so refresh to gain its subject + per-room JetStream API
// under enforce+ACL before publishing.
must(a.RefreshSession(), "A refresh after create room")
// A invites B (seals K to B's X25519 key). // A invites B (seals K to B's X25519 key).
must(a.Invite(roomID, b.Endpoint()), "A invite B") must(a.Invite(roomID, b.Endpoint()), "A invite B")
// B joins (fetches + decrypts K). // B joins (fetches + decrypts K).
must(b.Join(roomID), "B join") must(b.Join(roomID), "B join")
// B became a member via the invite above; refresh so B can subscribe to the
// room's subject under enforce+ACL (before subscribing — refresh drops subs).
must(b.RefreshSession(), "B refresh after join")
// B subscribes; capture received plaintexts. // B subscribes; capture received plaintexts.
recv := make(chan string, 4) recv := make(chan string, 4)
subB, err := b.Subscribe(roomID, func(f frame.Frame, plaintext []byte) { subB, err := b.Subscribe(roomID, func(f frame.Frame, plaintext []byte) {
+260
View File
@@ -0,0 +1,260 @@
// Command clientcheck is an end-to-end verification client for a live unibus
// cluster (issue 0011 GAP B). The 0011 chaos test validated only the control
// plane (healthz + meta/stream-leader failover + KV readable with 2/3); it never
// connected an authenticated bus client (nkey + TLS) to create a room and
// publish/subscribe through it, least of all across a node loss. clientcheck does
// exactly that with a real identity (the operator), so the data-plane end-to-end
// path — connect, create an E2E room, publish, receive decrypted — is exercised
// against the running cluster, including while a node is stopped.
//
// It is a reusable tool, not a throwaway script: point it at the cluster's CA,
// an identity file, and the NATS + control-plane seed lists.
//
// # golden: connect, create an E2E room, publish N, confirm N decrypted back
// clientcheck --ca ca.crt --identity-file operator.id \
// --nats-seeds nats://A:4250,nats://B:4250,nats://C:4250 \
// --ctrl-seeds https://A:8470,https://B:8470,https://C:8470 --messages 5
//
// # loop: publish a counter every interval for the duration, logging the node
// # it is attached to — stop a node mid-run (systemctl stop membershipd-cluster)
// # and watch it fail over to a survivor and keep receiving (quorum 2/3).
// clientcheck ... --mode loop --duration 45s --interval 1s
package main
import (
"crypto/rand"
"encoding/hex"
"flag"
"fmt"
"log"
"sort"
"strings"
"sync"
"time"
"github.com/enmanuel/unibus/pkg/busauth"
"github.com/enmanuel/unibus/pkg/client"
"github.com/enmanuel/unibus/pkg/frame"
"github.com/enmanuel/unibus/pkg/room"
)
func main() {
var (
caPath = flag.String("ca", "", "bus CA cert pinning TLS on both planes (required for a secured cluster)")
idFile = flag.String("identity-file", "", "path to the client identity JSON (e.g. `pass show unibus/operator-identity` written 0600) (required)")
natsSeeds = flag.String("nats-seeds", "", "comma-separated NATS urls of the cluster nodes (required)")
ctrlSeeds = flag.String("ctrl-seeds", "", "comma-separated control-plane https urls of the cluster nodes (required)")
subject = flag.String("subject", "test.gapcheck", "test room subject PREFIX; a random token is appended so runs never collide with real rooms")
messages = flag.Int("messages", 5, "golden mode: number of messages to publish and expect back")
mode = flag.String("mode", "golden", "golden (publish N, verify N decrypted) | loop (publish a counter for --duration, for failover testing)")
duration = flag.Duration("duration", 30*time.Second, "loop mode: how long to keep publishing")
interval = flag.Duration("interval", 1*time.Second, "loop mode: delay between published messages")
)
flag.Parse()
if *idFile == "" || *natsSeeds == "" || *ctrlSeeds == "" {
log.Fatalf("clientcheck: --identity-file, --nats-seeds and --ctrl-seeds are required")
}
id, err := client.LoadIdentity(*idFile)
if err != nil {
log.Fatalf("clientcheck: load identity: %v", err)
}
natsList := splitCSV(*natsSeeds)
ctrlList := splitCSV(*ctrlSeeds)
if len(natsList) == 0 || len(ctrlList) == 0 {
log.Fatalf("clientcheck: empty --nats-seeds or --ctrl-seeds")
}
// Build the secure client options: nkey on the data plane, TLS pinned to the
// bus CA on both planes, and the FULL seed lists so nats.go fails over to a
// surviving node when the attached one dies (the failover this tool verifies).
opts := client.Options{
NatsServers: natsList[1:],
CtrlURLs: ctrlList[1:],
}
if *caPath != "" {
tlsCfg, err := busauth.LoadCATLSConfig(*caPath)
if err != nil {
log.Fatalf("clientcheck: load CA: %v", err)
}
opts.UseNkey = true
opts.TLS = tlsCfg
opts.CtrlTLS = tlsCfg
for _, u := range ctrlList {
if !strings.HasPrefix(u, "https://") {
log.Fatalf("clientcheck: control URL %q must be https:// when --ca is set", u)
}
}
}
c, err := client.NewWithOptions(natsList[0], ctrlList[0], id, opts)
if err != nil {
log.Fatalf("clientcheck: connect: %v", err)
}
defer c.Close()
log.Printf("connected: endpoint=%s nats=%s", c.Endpoint().ID, c.ConnectedServer())
// Create an EPHEMERAL E2E room (encrypted + signed, NOT persisted): the test
// stays end-to-end encrypted (the cluster requires encryption on a public
// bind) while leaving no durable JetStream stream behind. The random subject
// token guarantees the room is unique and never a real room.
rnd := make([]byte, 8)
if _, err := rand.Read(rnd); err != nil {
log.Fatalf("clientcheck: random: %v", err)
}
subj := fmt.Sprintf("%s.%s", *subject, hex.EncodeToString(rnd))
policy := room.Policy{Encrypt: true, Persist: false, SignMsgs: true}
roomID, err := c.CreateRoom(subj, policy)
if err != nil {
log.Fatalf("clientcheck: create room: %v", err)
}
log.Printf("created E2E room: id=%s subject=%s (encrypt=%v sign=%v persist=%v)", roomID, subj, policy.Encrypt, policy.SignMsgs, policy.Persist)
// Under the per-subject ACL, NATS freezes permissions at connect time, so the
// just-created room's subject is not yet publishable/subscribable on the live
// connection. RefreshSession reconnects so the authenticator re-derives the
// ACL (now including this room) — the post-0006 contract every client follows
// after a membership change.
if err := c.RefreshSession(); err != nil {
log.Fatalf("clientcheck: refresh session: %v", err)
}
switch *mode {
case "golden":
runGolden(c, roomID, *messages)
case "loop":
runLoop(c, roomID, *duration, *interval)
default:
log.Fatalf("clientcheck: --mode must be golden or loop, got %q", *mode)
}
}
// runGolden subscribes, publishes n messages, and asserts all n come back
// decrypted. Exits non-zero if any are missing.
func runGolden(c *client.Client, roomID string, n int) {
var mu sync.Mutex
got := map[string]bool{}
sub, err := c.Subscribe(roomID, func(_ frame.Frame, plaintext []byte) {
mu.Lock()
got[string(plaintext)] = true
mu.Unlock()
})
if err != nil {
log.Fatalf("clientcheck: subscribe: %v", err)
}
defer sub.Unsubscribe()
time.Sleep(300 * time.Millisecond) // let the subscription settle
want := make([]string, n)
for i := 0; i < n; i++ {
msg := fmt.Sprintf("gapcheck-e2e-%d", i)
want[i] = msg
if err := c.Publish(roomID, []byte(msg)); err != nil {
log.Fatalf("clientcheck: publish %d: %v", i, err)
}
}
log.Printf("published %d messages to %s; waiting for decrypted echoes...", n, roomID)
deadline := time.Now().Add(15 * time.Second)
for time.Now().Before(deadline) {
mu.Lock()
have := len(got)
mu.Unlock()
if have >= n {
break
}
time.Sleep(100 * time.Millisecond)
}
mu.Lock()
defer mu.Unlock()
missing := 0
for _, w := range want {
if !got[w] {
missing++
log.Printf(" MISSING: %q", w)
}
}
log.Printf("connected node at finish: %s", c.ConnectedServer())
if missing > 0 {
log.Fatalf("GOLDEN FAIL: %d/%d messages not received decrypted", missing, n)
}
log.Printf("GOLDEN OK: all %d messages received and decrypted end-to-end", n)
}
// runLoop publishes a numbered message every interval for the duration and logs
// the count received plus the node currently attached, so an operator stopping a
// cluster node mid-run sees the client fail over to a survivor and keep receiving
// (quorum 2/3). It is the live failover-with-a-connected-client test the 0011
// chaos run never performed.
func runLoop(c *client.Client, roomID string, duration, interval time.Duration) {
var mu sync.Mutex
received := 0
servers := map[string]int{} // node -> #ticks observed attached
sub, err := c.Subscribe(roomID, func(_ frame.Frame, _ []byte) {
mu.Lock()
received++
mu.Unlock()
})
if err != nil {
log.Fatalf("clientcheck: subscribe: %v", err)
}
defer sub.Unsubscribe()
time.Sleep(300 * time.Millisecond)
log.Printf("loop: publishing every %s for %s — stop a node now to test failover", interval, duration)
end := time.Now().Add(duration)
sent := 0
for time.Now().Before(end) {
msg := fmt.Sprintf("gapcheck-loop-%d", sent)
err := c.Publish(roomID, []byte(msg))
sent++
mu.Lock()
recv := received
mu.Unlock()
node := c.ConnectedServer()
up := c.IsConnected()
if node != "" {
mu.Lock()
servers[node]++
mu.Unlock()
}
pubStatus := "ok"
if err != nil {
pubStatus = "ERR:" + err.Error()
}
log.Printf(" t=%2ds sent=%d recv=%d up=%v node=%s publish=%s",
sent, sent, recv, up, node, pubStatus)
time.Sleep(interval)
}
mu.Lock()
defer mu.Unlock()
log.Printf("loop done: sent=%d received=%d", sent, received)
nodes := make([]string, 0, len(servers))
for n := range servers {
nodes = append(nodes, n)
}
sort.Strings(nodes)
for _, n := range nodes {
log.Printf(" attached to %s for %d ticks", n, servers[n])
}
if len(servers) > 1 {
log.Printf("FAILOVER OBSERVED: client was attached to %d distinct nodes across the run", len(servers))
}
if received == 0 {
log.Fatalf("LOOP FAIL: received 0 messages")
}
log.Printf("LOOP OK: client kept receiving across the run (received=%d)", received)
}
func splitCSV(s string) []string {
var out []string
for _, p := range strings.Split(s, ",") {
if p = strings.TrimSpace(p); p != "" {
out = append(out, p)
}
}
return out
}
+221
View File
@@ -0,0 +1,221 @@
package main
// Regression for audit report 0008, vector N3: the binary must wire the
// replicated nonce store on a clustered node so a signed request accepted on one
// node cannot be replayed to another. The auditor's ephemeral attack showed the
// OLD binary never called UseReplicatedNonces (each node kept a per-process
// cache), so a captured request replayed to a second node with 200+200. These
// tests drive the SAME helper the binary uses (wireReplicatedNonces) so they
// prove the WIRING, not just the underlying API.
import (
"bytes"
"crypto/rand"
"encoding/base64"
"encoding/hex"
"io"
"net"
"net/http"
"net/http/httptest"
"path/filepath"
"strconv"
"testing"
"time"
cs "fn-registry/functions/cybersecurity"
"github.com/enmanuel/unibus/pkg/blobstore"
"github.com/enmanuel/unibus/pkg/embeddednats"
"github.com/enmanuel/unibus/pkg/frame"
"github.com/enmanuel/unibus/pkg/membership"
"github.com/nats-io/nats.go"
"github.com/nats-io/nats.go/jetstream"
)
func freePort(t *testing.T) int {
t.Helper()
l, err := net.Listen("tcp", "127.0.0.1:0")
if err != nil {
t.Fatalf("free port: %v", err)
}
defer l.Close()
return l.Addr().(*net.TCPAddr).Port
}
// signed008 builds a transport-signed control-plane request with a caller-chosen
// ts+nonce, so a test can reuse the exact same signed bytes against two nodes to
// exercise replay.
func signed008(t *testing.T, baseURL, method, path string, body []byte, id cs.Identity, ts int64, nonce string) *http.Request {
t.Helper()
canonical := membership.CanonicalRequest(method, path, strconv.FormatInt(ts, 10), nonce, body)
sig := cs.SignEd25519(id.SignPriv, canonical)
var rdr io.Reader
if body != nil {
rdr = bytes.NewReader(body)
}
req, err := http.NewRequest(method, baseURL+path, rdr)
if err != nil {
t.Fatalf("new request: %v", err)
}
req.Header.Set("X-Unibus-Pub", hex.EncodeToString(id.SignPub))
req.Header.Set("X-Unibus-Ts", strconv.FormatInt(ts, 10))
req.Header.Set("X-Unibus-Nonce", nonce)
req.Header.Set("X-Unibus-Sig", base64.StdEncoding.EncodeToString(sig))
return req
}
func randNonce(t *testing.T) string {
t.Helper()
raw := make([]byte, 16)
if _, err := rand.Read(raw); err != nil {
t.Fatalf("nonce: %v", err)
}
return base64.StdEncoding.EncodeToString(raw)
}
// TestAttack0008_N3 is the blocker regression: two clustered membershipd nodes
// wired through wireReplicatedNonces share a JetStream KV nonce bucket, so a
// request accepted on node A is rejected (401) when replayed to node B. Before
// the fix the binary never wired this and the replay returned 200.
func TestAttack0008_N3(t *testing.T) {
// One NATS+JetStream backing the shared nonce bucket (no client auth needed:
// the test drives the membership.Server's nonce store directly via HTTP).
ns, err := embeddednats.StartServer(embeddednats.ServerConfig{
StoreDir: t.TempDir(), Host: "127.0.0.1", Port: freePort(t),
})
if err != nil {
t.Fatalf("nats: %v", err)
}
t.Cleanup(func() { ns.Shutdown(); ns.WaitForShutdown() })
nc, err := nats.Connect(ns.ClientURL())
if err != nil {
t.Fatalf("connect: %v", err)
}
t.Cleanup(nc.Close)
js, err := jetstream.New(nc)
if err != nil {
t.Fatalf("jetstream: %v", err)
}
// Shared control-plane state (stand-in for the replicated store) + two nodes.
dir := t.TempDir()
store, err := membership.Open(filepath.Join(dir, "unibus.db"))
if err != nil {
t.Fatalf("store: %v", err)
}
t.Cleanup(func() { store.Close() })
alice, err := cs.GenerateIdentity()
if err != nil {
t.Fatalf("identity: %v", err)
}
if err := store.AddUser(hex.EncodeToString(alice.SignPub), "alice", membership.RoleAdmin); err != nil {
t.Fatalf("add alice: %v", err)
}
blobs, _ := blobstore.New(filepath.Join(dir, "blobs"))
// Each node is wired EXACTLY as the binary wires a clustered node.
mkNode := func() *httptest.Server {
srv := membership.NewServer(store, blobs, membership.AuthEnforce)
if err := wireReplicatedNonces(srv, js, true /*clustered*/, 1); err != nil {
t.Fatalf("wireReplicatedNonces: %v", err)
}
return httptest.NewServer(srv)
}
nodeA := mkNode()
t.Cleanup(nodeA.Close)
nodeB := mkNode()
t.Cleanup(nodeB.Close)
ts := time.Now().Unix()
nonce := randNonce(t)
path := "/members/" + frame.EndpointID(alice.SignPub) + "/rooms"
// Golden: alice's signed request is accepted on node A.
respA, err := http.DefaultClient.Do(signed008(t, nodeA.URL, "GET", path, nil, alice, ts, nonce))
if err != nil {
t.Fatalf("do A: %v", err)
}
respA.Body.Close()
if respA.StatusCode != http.StatusOK {
t.Fatalf("node A first use: status %d, want 200", respA.StatusCode)
}
// Error path (the attack): replay the SAME signed bytes to node B → 401.
respB, err := http.DefaultClient.Do(signed008(t, nodeB.URL, "GET", path, nil, alice, ts, nonce))
if err != nil {
t.Fatalf("do B: %v", err)
}
respB.Body.Close()
if respB.StatusCode != http.StatusUnauthorized {
t.Fatalf("cross-node replay to node B: status %d, want 401 (replayed nonce must be rejected)", respB.StatusCode)
}
}
// TestAttack0008_N3_StandaloneKeepsLocalCache is the edge: a NON-clustered node
// must NOT require JetStream — wireReplicatedNonces is a no-op and the node keeps
// its in-memory cache, which still rejects a same-node replay (the single-node
// guarantee is unchanged). This proves the fix does not add a JetStream
// dependency to standalone deployments.
func TestAttack0008_N3_StandaloneKeepsLocalCache(t *testing.T) {
dir := t.TempDir()
store, err := membership.Open(filepath.Join(dir, "unibus.db"))
if err != nil {
t.Fatalf("store: %v", err)
}
t.Cleanup(func() { store.Close() })
alice, err := cs.GenerateIdentity()
if err != nil {
t.Fatalf("identity: %v", err)
}
if err := store.AddUser(hex.EncodeToString(alice.SignPub), "alice", membership.RoleAdmin); err != nil {
t.Fatalf("add alice: %v", err)
}
blobs, _ := blobstore.New(filepath.Join(dir, "blobs"))
srv := membership.NewServer(store, blobs, membership.AuthEnforce)
// Standalone: clustered=false, js=nil. Must succeed (no JetStream needed).
if err := wireReplicatedNonces(srv, nil, false /*clustered*/, 1); err != nil {
t.Fatalf("standalone wireReplicatedNonces must be a no-op, got: %v", err)
}
node := httptest.NewServer(srv)
t.Cleanup(node.Close)
ts := time.Now().Unix()
nonce := randNonce(t)
path := "/members/" + frame.EndpointID(alice.SignPub) + "/rooms"
resp1, err := http.DefaultClient.Do(signed008(t, node.URL, "GET", path, nil, alice, ts, nonce))
if err != nil {
t.Fatalf("do 1: %v", err)
}
resp1.Body.Close()
if resp1.StatusCode != http.StatusOK {
t.Fatalf("first use: status %d, want 200", resp1.StatusCode)
}
// Same-node replay is still rejected by the in-memory cache.
resp2, err := http.DefaultClient.Do(signed008(t, node.URL, "GET", path, nil, alice, ts, nonce))
if err != nil {
t.Fatalf("do 2: %v", err)
}
resp2.Body.Close()
if resp2.StatusCode != http.StatusUnauthorized {
t.Fatalf("same-node replay: status %d, want 401", resp2.StatusCode)
}
}
// TestAttack0008_N3_ClusteredRequiresJetStream proves the hard rule: a clustered
// node with NO JetStream available refuses (error), so the binary fails fast
// instead of silently running with a per-process cache.
func TestAttack0008_N3_ClusteredRequiresJetStream(t *testing.T) {
dir := t.TempDir()
store, err := membership.Open(filepath.Join(dir, "unibus.db"))
if err != nil {
t.Fatalf("store: %v", err)
}
t.Cleanup(func() { store.Close() })
blobs, _ := blobstore.New(filepath.Join(dir, "blobs"))
srv := membership.NewServer(store, blobs, membership.AuthEnforce)
if err := wireReplicatedNonces(srv, nil, true /*clustered*/, 1); err == nil {
t.Fatalf("clustered node with no JetStream must fail, got nil")
}
}
+150 -2
View File
@@ -3,10 +3,94 @@ package main
import ( import (
"fmt" "fmt"
"net" "net"
"net/url"
"os"
"strings"
"github.com/enmanuel/unibus/pkg/membership" "github.com/enmanuel/unibus/pkg/membership"
) )
// splitRoutes parses the comma-separated --routes flag into a clean slice of
// route URLs, dropping empty entries and surrounding whitespace so a trailing
// comma or a spaced list does not yield a bogus empty route.
func splitRoutes(csv string) []string {
var out []string
for _, r := range strings.Split(csv, ",") {
if r = strings.TrimSpace(r); r != "" {
out = append(out, r)
}
}
return out
}
// resolveClusterPass resolves the cluster route secret WITHOUT leaking it through
// argv (audit 0008 N1-low: --cluster-pass in argv is visible in ps/journald).
// Precedence: --cluster-pass-file (read + trim the file), then the env var
// UNIBUS_CLUSTER_PASS, then the legacy --cluster-pass flag (argv-visible, kept for
// dev/compat). env is injected (os.Getenv result) so the function stays testable.
// It returns the secret and a short source label for logging (never the secret).
func resolveClusterPass(passFlag, passFile, env string) (secret, source string, err error) {
if passFile != "" {
b, rerr := os.ReadFile(passFile)
if rerr != nil {
return "", "", fmt.Errorf("read --cluster-pass-file %q: %w", passFile, rerr)
}
return strings.TrimSpace(string(b)), "file", nil
}
if env != "" {
return env, "env", nil
}
if passFlag != "" {
return passFlag, "flag", nil
}
return "", "none", nil
}
// injectRouteCreds rewrites each route URL that carries NO userinfo to embed
// user:pass, so the cluster secret is supplied once (via file/env) instead of
// repeated in every --routes argv entry where ps/journald would expose it. A route
// that already carries userinfo is left untouched (operator override). With an
// empty user it is a no-op. A malformed route URL is an error (configuration bug)
// rather than a silently dropped peer.
func injectRouteCreds(routes []string, user, pass string) ([]string, error) {
if user == "" {
return routes, nil
}
out := make([]string, 0, len(routes))
for _, r := range routes {
u, err := url.Parse(r)
if err != nil {
return nil, fmt.Errorf("parse route %q: %w", r, err)
}
if u.User == nil {
u.User = url.UserPassword(user, pass)
}
out = append(out, u.String())
}
return out, nil
}
// isLoopbackURL reports whether a NATS url targets this host only (loopback). Used
// to guard migrate-to-kv (audit 0008 N6): pushing the allowlist to a REMOTE NATS
// without TLS would send handles/roles/sign-pubs in cleartext, so a remote target
// must be TLS-pinned (--ca). A url we cannot classify is treated as NON-loopback
// (conservative: it then requires --ca).
func isLoopbackURL(natsURL string) bool {
u, err := url.Parse(natsURL)
if err != nil {
return false
}
host := u.Hostname()
switch host {
case "localhost":
return true
case "":
return false
}
ip := net.ParseIP(host)
return ip != nil && ip.IsLoopback()
}
// isLoopbackBind reports whether the --bind value keeps the service reachable // isLoopbackBind reports whether the --bind value keeps the service reachable
// only from this host. An empty bind means "all interfaces" (public), and a // only from this host. An empty bind means "all interfaces" (public), and a
// hostname we cannot resolve to a loopback literal is treated as public — the // hostname we cannot resolve to a loopback literal is treated as public — the
@@ -29,9 +113,12 @@ func isLoopbackBind(bind string) bool {
// configuration that would expose the bus without enforced authentication: // configuration that would expose the bus without enforced authentication:
// //
// - a non-loopback --bind without --bus-auth enforce (the data plane and // - a non-loopback --bind without --bus-auth enforce (the data plane and
// control plane would both accept anyone), and // control plane would both accept anyone),
// - --tls-cert/--tls-key without --bus-auth enforce (TLS encrypts the channel // - --tls-cert/--tls-key without --bus-auth enforce (TLS encrypts the channel
// but authenticates no one — encrypted access for everybody is still open). // but authenticates no one — encrypted access for everybody is still open), and
// - a non-loopback --bind WITHOUT --tls-cert/--tls-key (the control plane would
// serve metadata over plaintext HTTP publicly — audit H5 reappearing, the N4
// gap the re-audit found: TLS was available but not mandatory).
// //
// It is a pure function of the parsed flags so the command can fail fast at // It is a pure function of the parsed flags so the command can fail fast at
// startup and tests can assert the policy without booting a server. // startup and tests can assert the policy without booting a server.
@@ -46,5 +133,66 @@ func validateBootConfig(bind string, mode membership.AuthMode, tlsCert, tlsKey s
"refusing to start: --tls-cert/--tls-key set but --bus-auth is %q; TLS without enforced auth is fail-open (encrypted channel, no authentication) — set --bus-auth enforce", "refusing to start: --tls-cert/--tls-key set but --bus-auth is %q; TLS without enforced auth is fail-open (encrypted channel, no authentication) — set --bus-auth enforce",
mode) mode)
} }
if !isLoopbackBind(bind) && (tlsCert == "" || tlsKey == "") {
return fmt.Errorf(
"refusing to start: --bind %q is not loopback but --tls-cert/--tls-key are not both set; a public control plane must serve HTTPS or its metadata (subjects, pubkeys, sealed keys, the social graph) travels in cleartext to a network MITM (audit H5/N4) — provide a CA-signed --tls-cert/--tls-key, or bind 127.0.0.1 for local dev",
bind)
}
return nil
}
// validateClusterConfig guards the cluster route layer (issue 0003a). The route
// layer is a server-to-server trust boundary distinct from the client data
// plane: leaving it open lets anyone who reaches the route port join the cluster
// or inject messages into the whole bus (audit 0004, "auth of the cluster
// routes"). So on a public (non-loopback) bind, a cluster MUST carry both a
// shared route secret AND mutual route TLS. It is a pure function of the parsed
// flags. An empty clusterName means "no cluster" (standalone) and is always
// allowed.
//
// The three route-TLS paths are all-or-nothing (mutual TLS needs the node cert,
// its key, and the CA together), independent of the bind, so a partial TLS
// config never silently degrades to plaintext routes.
//
// Homogeneous posture (issue 0006d, audit 0008 N1): a cluster is only as secure
// as its weakest node — the data plane forwards every subject between nodes, so a
// single node running without enforced auth lets an unauthenticated peer
// Subscribe(">") on it and harvest the traffic forwarded from the ACL'd nodes.
// This node therefore REFUSES to join a cluster unless it runs --bus-auth enforce,
// regardless of bind: a clustered node is a production node, and there is no safe
// "dev cluster without auth". (A peer running a tampered binary is out of this
// node's control; /healthz exposes each node's posture so a monitor can detect
// one that is not enforce+ACL — see Server.Posture.)
func validateClusterConfig(clusterName, bind, user, pass, rtCert, rtKey, rtCA string, mode membership.AuthMode) error {
rtAny := rtCert != "" || rtKey != "" || rtCA != ""
rtAll := rtCert != "" && rtKey != "" && rtCA != ""
if rtAny && !rtAll {
return fmt.Errorf(
"refusing to start: --route-tls-cert/--route-tls-key/--route-tls-ca must be set together (mutual route TLS needs all three)")
}
if clusterName == "" {
return nil // standalone: no route layer to secure
}
// A clustered node MUST enforce auth (homogeneous posture). Checked before the
// loopback shortcut so even a loopback cluster cannot form without enforce.
if mode != membership.AuthEnforce {
return fmt.Errorf(
"refusing to start: cluster %q requires --bus-auth enforce; a cluster node without enforced auth+ACL lets an unauthenticated peer harvest the traffic forwarded from the other nodes (audit 0008 N1) — every node must run the same enforce+ACL+TLS posture",
clusterName)
}
if isLoopbackBind(bind) {
return nil // loopback cluster is dev-only and unreachable from outside
}
// Public cluster: demand a route secret and mutual route TLS.
if user == "" || pass == "" {
return fmt.Errorf(
"refusing to start: cluster %q on public bind %q requires --cluster-user and --cluster-pass; an unauthenticated route port lets anyone join the cluster",
clusterName, bind)
}
if !rtAll {
return fmt.Errorf(
"refusing to start: cluster %q on public bind %q requires mutual route TLS (--route-tls-cert/--route-tls-key/--route-tls-ca); plaintext routes expose server-to-server traffic and admit unsigned nodes",
clusterName, bind)
}
return nil return nil
} }
+118 -2
View File
@@ -30,6 +30,31 @@ func TestAudit_FailOpenTLSWithoutAuth(t *testing.T) {
} }
} }
// TestGap_PublicEnforceNoTLS ports the re-auditor's N4 gap: the H2 guard refused
// "public without enforce" and "TLS without enforce", but ALLOWED a public bind
// with enforce and NO --tls-cert, so the control plane served metadata over
// plaintext HTTP publicly (H5 reappearing). The guard now refuses it.
func TestGap_PublicEnforceNoTLS(t *testing.T) {
// The exact auditor configuration: public bind, enforce on, no TLS cert/key.
err := validateBootConfig("0.0.0.0", membership.AuthEnforce, "", "")
if err == nil {
t.Fatalf("public bind + enforce + NO --tls-cert must be refused: the control plane would serve plaintext HTTP publicly (audit N4)")
}
if !strings.Contains(err.Error(), "tls-cert") {
t.Fatalf("error should point the operator at --tls-cert/--tls-key, got: %v", err)
}
// Golden: the same public+enforce config WITH a cert/key is allowed.
if err := validateBootConfig("0.0.0.0", membership.AuthEnforce, "server.crt", "server.key"); err != nil {
t.Fatalf("public + enforce + TLS is the intended production config, got: %v", err)
}
// Edge: loopback without TLS stays allowed (local dev is not a public exposure).
if err := validateBootConfig("127.0.0.1", membership.AuthOff, "", ""); err != nil {
t.Fatalf("loopback dev without TLS must remain allowed, got: %v", err)
}
}
// TestBootConfigPolicy is the full table: the golden secure-public config is // TestBootConfigPolicy is the full table: the golden secure-public config is
// allowed, dev loopback is allowed, and every fail-open shape is refused. // allowed, dev loopback is allowed, and every fail-open shape is refused.
func TestBootConfigPolicy(t *testing.T) { func TestBootConfigPolicy(t *testing.T) {
@@ -41,19 +66,25 @@ func TestBootConfigPolicy(t *testing.T) {
key string key string
wantErr bool wantErr bool
}{ }{
// Golden: the intended public production config. // Golden: the intended public production config — enforce AND TLS.
{"public+enforce+tls", "0.0.0.0", membership.AuthEnforce, "s.crt", "s.key", false}, {"public+enforce+tls", "0.0.0.0", membership.AuthEnforce, "s.crt", "s.key", false},
{"public+enforce+notls", "0.0.0.0", membership.AuthEnforce, "", "", false},
// Edge: local dev on loopback may stay open (no auth, no TLS). // Edge: local dev on loopback may stay open (no auth, no TLS).
{"loopback+off", "127.0.0.1", membership.AuthOff, "", "", false}, {"loopback+off", "127.0.0.1", membership.AuthOff, "", "", false},
{"loopback-ipv6+off", "::1", membership.AuthOff, "", "", false}, {"loopback-ipv6+off", "::1", membership.AuthOff, "", "", false},
{"localhost+off", "localhost", membership.AuthOff, "", "", false}, {"localhost+off", "localhost", membership.AuthOff, "", "", false},
{"loopback+soft", "127.0.0.1", membership.AuthSoft, "", "", false}, {"loopback+soft", "127.0.0.1", membership.AuthSoft, "", "", false},
// Edge: loopback with full enforce+TLS is also fine.
{"loopback+enforce+tls", "127.0.0.1", membership.AuthEnforce, "s.crt", "s.key", false},
// Error: public bind without enforce. // Error: public bind without enforce.
{"public+off", "0.0.0.0", membership.AuthOff, "", "", true}, {"public+off", "0.0.0.0", membership.AuthOff, "", "", true},
{"public+soft", "0.0.0.0", membership.AuthSoft, "", "", true}, {"public+soft", "0.0.0.0", membership.AuthSoft, "", "", true},
{"lan-ip+off", "192.168.1.10", membership.AuthOff, "", "", true}, {"lan-ip+off", "192.168.1.10", membership.AuthOff, "", "", true},
{"empty-bind+off", "", membership.AuthOff, "", "", true}, {"empty-bind+off", "", membership.AuthOff, "", "", true},
// Error (N4): public bind + enforce but NO TLS -> plaintext control plane.
{"public+enforce+notls", "0.0.0.0", membership.AuthEnforce, "", "", true},
{"public+enforce+certonly", "0.0.0.0", membership.AuthEnforce, "s.crt", "", true},
{"public+enforce+keyonly", "0.0.0.0", membership.AuthEnforce, "", "s.key", true},
{"lan-ip+enforce+notls", "192.168.1.10", membership.AuthEnforce, "", "", true},
// Error: TLS flags without enforce (cert or key alone is enough to trip it). // Error: TLS flags without enforce (cert or key alone is enough to trip it).
{"loopback+tlscert+off", "127.0.0.1", membership.AuthOff, "s.crt", "", true}, {"loopback+tlscert+off", "127.0.0.1", membership.AuthOff, "s.crt", "", true},
{"loopback+tlskey+soft", "127.0.0.1", membership.AuthSoft, "", "s.key", true}, {"loopback+tlskey+soft", "127.0.0.1", membership.AuthSoft, "", "s.key", true},
@@ -70,3 +101,88 @@ func TestBootConfigPolicy(t *testing.T) {
}) })
} }
} }
// TestClusterConfigPolicy is the cluster route guard (issue 0003a): a standalone
// server is always fine; a loopback cluster is dev-only and unguarded; a public
// cluster demands both a route secret and complete mutual route TLS; and the
// route-TLS flags are all-or-nothing regardless of bind.
func TestClusterConfigPolicy(t *testing.T) {
const c, k, ca = "node.crt", "node.key", "ca.crt"
en := membership.AuthEnforce
off := membership.AuthOff
soft := membership.AuthSoft
cases := []struct {
name string
clusterName, bind string
user, pass string
rtCert, rtKey, rtCA string
mode membership.AuthMode
wantErr bool
}{
// Standalone (no cluster name) is always allowed, even on a public bind and
// without enforce — the cluster posture rule does not apply to a single node.
{"standalone-public-off", "", "0.0.0.0", "", "", "", "", "", off, false},
// Loopback dev cluster WITH enforce: allowed (unreachable from outside).
{"loopback-cluster-enforce", "unibus", "127.0.0.1", "", "", "", "", "", en, false},
// Golden: full public HA config under enforce.
{"public-full-enforce", "unibus", "0.0.0.0", "u", "p", c, k, ca, en, false},
// N1 (audit 0008): a clustered node WITHOUT enforce is refused — even on
// loopback — so no weak node can join the cluster.
{"cluster-off-refused", "unibus", "127.0.0.1", "", "", "", "", "", off, true},
{"cluster-soft-refused", "unibus", "0.0.0.0", "u", "p", c, k, ca, soft, true},
// Error: public cluster without a route secret (enforce on, fails on secret).
{"public-no-secret", "unibus", "0.0.0.0", "", "", c, k, ca, en, true},
{"public-half-secret", "unibus", "0.0.0.0", "u", "", c, k, ca, en, true},
// Error: public cluster without mutual route TLS.
{"public-no-tls", "unibus", "10.0.0.1", "u", "p", "", "", "", en, true},
// Error: partial route-TLS flags trip regardless of bind/mode.
{"loopback-partial-tls", "unibus", "127.0.0.1", "", "", c, "", "", en, true},
{"standalone-partial-tls", "", "127.0.0.1", "", "", c, k, "", off, true},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
err := validateClusterConfig(tc.clusterName, tc.bind, tc.user, tc.pass, tc.rtCert, tc.rtKey, tc.rtCA, tc.mode)
if tc.wantErr && err == nil {
t.Fatalf("cluster config %+v should be refused", tc)
}
if !tc.wantErr && err != nil {
t.Fatalf("cluster config %+v should be allowed, got: %v", tc, err)
}
})
}
}
// TestAttack0008_N1 is the regression for audit 0008 N1 scenario 2: a node
// configured to join a cluster while NOT enforcing auth (the weak node that lets
// an unauthenticated peer harvest the cluster's forwarded traffic) must be refused
// at startup. The homogeneous-posture rule makes this binary unable to BE that
// weak node.
func TestAttack0008_N1(t *testing.T) {
// Weak node: clustered but --bus-auth off -> refused.
if err := validateClusterConfig("unibus", "0.0.0.0", "u", "p", "n.crt", "n.key", "ca.crt", membership.AuthOff); err == nil {
t.Fatalf("a clustered node without enforce must be refused (audit 0008 N1)")
}
// Same node WITH enforce + full route security -> allowed.
if err := validateClusterConfig("unibus", "0.0.0.0", "u", "p", "n.crt", "n.key", "ca.crt", membership.AuthEnforce); err != nil {
t.Fatalf("a clustered enforce node with full route security must be allowed, got: %v", err)
}
}
func TestSplitRoutes(t *testing.T) {
cases := []struct {
in string
want int
}{
{"", 0},
{"nats://a:1", 1},
{"nats://a:1,nats://b:2", 2},
{" nats://a:1 , nats://b:2 ", 2}, // spaces trimmed
{"nats://a:1,,", 1}, // empty entries dropped
{",", 0},
}
for _, c := range cases {
if got := splitRoutes(c.in); len(got) != c.want {
t.Fatalf("splitRoutes(%q) = %v (len %d), want len %d", c.in, got, len(got), c.want)
}
}
}
+84
View File
@@ -0,0 +1,84 @@
package main
import (
"fmt"
cs "fn-registry/functions/cybersecurity"
"github.com/enmanuel/unibus/pkg/busauth"
"github.com/nats-io/nats.go"
"github.com/nats-io/nats.go/jetstream"
server "github.com/nats-io/nats-server/v2/server"
)
// connectInternalJS opens a privileged JetStream client from membershipd to its
// OWN embedded NATS server. This is the resolution of the "bootstrap cycle"
// (issue 0006a/c): the service needs JetStream to create the replicated nonce
// bucket and the control-plane KV, but under enforce the data plane only accepts
// allowlisted clients confined to their rooms. The connection therefore
// authenticates with the process's ephemeral internal identity — the identity the
// authenticator was built to recognize (NewNkeyAuthenticatorACLInternal) and
// grant full permissions — without ever appearing in the user allowlist.
//
// It uses the in-process transport (nats.InProcessServer), a Go pipe inside the
// process, so it bypasses TLS entirely: no CA wiring is needed for this
// self-connection even when the public data plane is TLS-only. useNkey mirrors
// whether the embedded server enforces auth: under enforce the internal identity
// presents its nkey; without enforce the server accepts an unauthenticated
// in-process client and the nkey is omitted.
//
// The caller owns the returned connection and must Close it on shutdown (after
// the JetStream context is no longer used).
func connectInternalJS(ns *server.Server, internalID cs.Identity, useNkey bool) (*nats.Conn, jetstream.JetStream, error) {
opts := []nats.Option{
nats.Name("membershipd-internal"),
nats.InProcessServer(ns),
}
if useNkey {
pub, sign, err := busauth.ClientNkey(internalID.SignPriv)
if err != nil {
return nil, nil, fmt.Errorf("internal nkey: %w", err)
}
opts = append(opts, nats.Nkey(pub, sign))
}
// The URL is ignored for an in-process connection; the InProcessServer option
// supplies the transport.
nc, err := nats.Connect("", opts...)
if err != nil {
return nil, nil, fmt.Errorf("connect internal nats: %w", err)
}
js, err := jetstream.New(nc)
if err != nil {
nc.Close()
return nil, nil, fmt.Errorf("internal jetstream: %w", err)
}
return nc, js, nil
}
// connectExternalJS opens a JetStream client to an EXTERNAL NATS the operator
// runs (membershipd started with --nats-url). Unlike the embedded path there is
// no in-process transport and no internal identity: the external server enforces
// its own auth, so membershipd connects as a plain client (optionally TLS-pinned
// to the bus CA). It is best-effort and intended for an operator-managed cluster;
// the standard unibus deploy uses the embedded server (connectInternalJS).
func connectExternalJS(natsURL, caPath string) (*nats.Conn, jetstream.JetStream, error) {
opts := []nats.Option{nats.Name("membershipd-internal")}
if caPath != "" {
tlsCfg, err := busauth.LoadCATLSConfig(caPath)
if err != nil {
return nil, nil, fmt.Errorf("load CA %q: %w", caPath, err)
}
opts = append(opts, nats.Secure(tlsCfg))
}
nc, err := nats.Connect(natsURL, opts...)
if err != nil {
return nil, nil, fmt.Errorf("connect external nats %q: %w", natsURL, err)
}
js, err := jetstream.New(nc)
if err != nil {
nc.Close()
return nil, nil, fmt.Errorf("external jetstream: %w", err)
}
return nc, js, nil
}
+119
View File
@@ -0,0 +1,119 @@
package main
// Bootstrap test for issue 0006a/c: under enforce, membershipd must still reach
// JetStream on its OWN embedded server to create the nonce/KV buckets. It does so
// with an ephemeral internal identity the authenticator grants full permissions
// (NewNkeyAuthenticatorACLInternal). These tests prove that privileged
// self-connection works AND that no other identity can claim it.
import (
"context"
"encoding/hex"
"net"
"testing"
"time"
cs "fn-registry/functions/cybersecurity"
"github.com/enmanuel/unibus/pkg/busauth"
"github.com/enmanuel/unibus/pkg/embeddednats"
"github.com/nats-io/nats.go"
"github.com/nats-io/nats.go/jetstream"
)
func icFreePort(t *testing.T) int {
t.Helper()
l, err := net.Listen("tcp", "127.0.0.1:0")
if err != nil {
t.Fatalf("free port: %v", err)
}
defer l.Close()
return l.Addr().(*net.TCPAddr).Port
}
// TestInternalConnPrivilegedUnderEnforce: with an enforce authenticator that
// authorizes NO bus user, the internal identity still connects in-process and has
// full permissions — it creates a KV bucket and round-trips a value. This is the
// resolution of the bootstrap cycle the audit flagged as the reason the KV store
// was never wired.
func TestInternalConnPrivilegedUnderEnforce(t *testing.T) {
internalID, err := cs.GenerateIdentity()
if err != nil {
t.Fatalf("internal identity: %v", err)
}
internalPubHex := hex.EncodeToString(internalID.SignPub)
// Authenticator: no bus user is authorized; only the internal identity passes.
auth := busauth.NewNkeyAuthenticatorACLInternal(
func(string) bool { return false },
busauth.PermissionsFromSubjects(func(string) ([]string, error) { return []string{"_INBOX.>"}, nil }),
internalPubHex,
)
ns, err := embeddednats.StartServer(embeddednats.ServerConfig{
StoreDir: t.TempDir(), Host: "127.0.0.1", Port: icFreePort(t), Auth: auth,
})
if err != nil {
t.Fatalf("nats: %v", err)
}
t.Cleanup(func() { ns.Shutdown(); ns.WaitForShutdown() })
nc, js, err := connectInternalJS(ns, internalID, true /*useNkey*/)
if err != nil {
t.Fatalf("connectInternalJS: %v", err)
}
t.Cleanup(nc.Close)
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
kv, err := js.CreateKeyValue(ctx, jetstream.KeyValueConfig{Bucket: "KV_UNIBUS_test", Replicas: 1})
if err != nil {
t.Fatalf("internal conn could not create KV bucket (full perms expected): %v", err)
}
if _, err := kv.Put(ctx, "k", []byte("v")); err != nil {
t.Fatalf("kv put: %v", err)
}
e, err := kv.Get(ctx, "k")
if err != nil || string(e.Value()) != "v" {
t.Fatalf("kv get: val=%q err=%v", e, err)
}
}
// TestInternalConnOutsiderRejected: an identity that is neither the internal one
// nor an allowlisted bus user cannot connect — proving the internal bypass is
// scoped to the exact internal key, not a blanket hole.
func TestInternalConnOutsiderRejected(t *testing.T) {
internalID, err := cs.GenerateIdentity()
if err != nil {
t.Fatalf("internal identity: %v", err)
}
auth := busauth.NewNkeyAuthenticatorACLInternal(
func(string) bool { return false },
busauth.PermissionsFromSubjects(func(string) ([]string, error) { return []string{"_INBOX.>"}, nil }),
hex.EncodeToString(internalID.SignPub),
)
ns, err := embeddednats.StartServer(embeddednats.ServerConfig{
StoreDir: t.TempDir(), Host: "127.0.0.1", Port: icFreePort(t), Auth: auth,
})
if err != nil {
t.Fatalf("nats: %v", err)
}
t.Cleanup(func() { ns.Shutdown(); ns.WaitForShutdown() })
outsider, err := cs.GenerateIdentity()
if err != nil {
t.Fatalf("outsider identity: %v", err)
}
pub, sign, err := busauth.ClientNkey(outsider.SignPriv)
if err != nil {
t.Fatalf("outsider nkey: %v", err)
}
conn, err := nats.Connect(ns.ClientURL(),
nats.Nkey(pub, sign),
nats.MaxReconnects(0),
nats.Timeout(2*time.Second),
)
if err == nil {
conn.Close()
t.Fatalf("outsider (unauthorized, non-internal) must be rejected, but connected")
}
}
+154
View File
@@ -0,0 +1,154 @@
package main
// Wiring tests for issue 0006c: --store kv selects the replicated JetStream KV
// control plane, the authenticator serves from it through the storeHolder, and a
// new node sees state created by another (the divergence that per-node SQLite
// caused — audit 0008 N5 — is gone). Branch-by-abstraction is verified elsewhere
// (the SQLite default path is the unchanged baseline covered by the existing
// suite).
import (
"encoding/hex"
"testing"
"time"
cs "fn-registry/functions/cybersecurity"
"github.com/enmanuel/unibus/pkg/busauth"
"github.com/enmanuel/unibus/pkg/embeddednats"
"github.com/enmanuel/unibus/pkg/membership"
"github.com/nats-io/nats.go"
"github.com/nats-io/nats.go/jetstream"
)
// TestKVStoreBootstrapUnderEnforce drives the exact decentralized boot the binary
// performs: build the authenticator over an empty holder, start NATS, open the
// privileged internal connection, open the KV store, publish it into the holder,
// then a real bus user (seeded into the KV store) authenticates over nkey. This
// proves the bootstrap cycle is broken correctly — the KV-backed control plane
// authorizes live clients under enforce.
func TestKVStoreBootstrapUnderEnforce(t *testing.T) {
internalID, err := cs.GenerateIdentity()
if err != nil {
t.Fatalf("internal identity: %v", err)
}
holder := &storeHolder{}
auth := busauth.NewNkeyAuthenticatorACLInternal(
holder.IsAuthorized,
busauth.PermissionsFromSubjects(holder.subjectACL),
hex.EncodeToString(internalID.SignPub),
)
ns, err := embeddednats.StartServer(embeddednats.ServerConfig{
StoreDir: t.TempDir(), Host: "127.0.0.1", Port: freePort(t), Auth: auth,
})
if err != nil {
t.Fatalf("nats: %v", err)
}
t.Cleanup(func() { ns.Shutdown(); ns.WaitForShutdown() })
// Privileged internal connection opens the KV store while the holder still
// denies every normal client.
intNC, js, err := connectInternalJS(ns, internalID, true)
if err != nil {
t.Fatalf("connectInternalJS: %v", err)
}
t.Cleanup(intNC.Close)
kvStore, err := membership.OpenJetStream(js, membership.JetStreamConfig{Replicas: 1, OpTimeout: 3 * time.Second})
if err != nil {
t.Fatalf("open kv store: %v", err)
}
holder.set(kvStore)
// Seed a bus user into the KV control plane.
alice, err := cs.GenerateIdentity()
if err != nil {
t.Fatalf("alice: %v", err)
}
if err := kvStore.AddUser(hex.EncodeToString(alice.SignPub), "alice", membership.RoleMember); err != nil {
t.Fatalf("seed alice: %v", err)
}
// alice authenticates over nkey — authorized via the KV store through the holder.
pub, sign, err := busauth.ClientNkey(alice.SignPriv)
if err != nil {
t.Fatalf("alice nkey: %v", err)
}
aliceNC, err := nats.Connect(ns.ClientURL(), nats.Nkey(pub, sign), nats.MaxReconnects(0), nats.Timeout(2*time.Second))
if err != nil {
t.Fatalf("alice (KV-authorized) must connect under enforce: %v", err)
}
aliceNC.Close()
// An outsider not in the KV store is denied (fail closed).
outsider, err := cs.GenerateIdentity()
if err != nil {
t.Fatalf("outsider: %v", err)
}
opub, osign, err := busauth.ClientNkey(outsider.SignPriv)
if err != nil {
t.Fatalf("outsider nkey: %v", err)
}
if oc, err := nats.Connect(ns.ClientURL(), nats.Nkey(opub, osign), nats.MaxReconnects(0), nats.Timeout(2*time.Second)); err == nil {
oc.Close()
t.Fatalf("an outsider absent from the KV store must be rejected")
}
}
// TestKVStoreDecentralizedConsistency: a room/user created via one node's KV store
// is immediately visible to another node's KV store over the same JetStream — the
// shared, replicated control plane that ends the per-node SQLite divergence.
func TestKVStoreDecentralizedConsistency(t *testing.T) {
ns, err := embeddednats.StartServer(embeddednats.ServerConfig{
StoreDir: t.TempDir(), Host: "127.0.0.1", Port: freePort(t),
})
if err != nil {
t.Fatalf("nats: %v", err)
}
t.Cleanup(func() { ns.Shutdown(); ns.WaitForShutdown() })
open := func() membership.Store {
nc, err := nats.Connect(ns.ClientURL())
if err != nil {
t.Fatalf("connect: %v", err)
}
t.Cleanup(nc.Close)
js, err := jetstream.New(nc)
if err != nil {
t.Fatalf("jetstream: %v", err)
}
st, err := membership.OpenJetStream(js, membership.JetStreamConfig{Replicas: 1, OpTimeout: 3 * time.Second})
if err != nil {
t.Fatalf("open kv: %v", err)
}
return st
}
nodeA := open()
nodeB := open()
owner, err := cs.GenerateIdentity()
if err != nil {
t.Fatalf("owner: %v", err)
}
ownerPub := hex.EncodeToString(owner.SignPub)
if err := nodeA.AddUser(ownerPub, "owner", membership.RoleAdmin); err != nil {
t.Fatalf("nodeA add user: %v", err)
}
if err := nodeA.CreateRoom(
membership.RoomInfo{RoomID: "ROOMX", Subject: "room.shared.x", OwnerEndpoint: "owner-ep"},
owner.SignPub, owner.KexPub, nil,
); err != nil {
t.Fatalf("nodeA create room: %v", err)
}
// nodeB (a different connection, same buckets) sees both immediately.
if !nodeB.IsAuthorized(ownerPub) {
t.Fatalf("nodeB must see the user created on nodeA (decentralized state divergence)")
}
got, err := nodeB.GetRoom("ROOMX")
if err != nil {
t.Fatalf("nodeB must see the room created on nodeA: %v", err)
}
if got.Subject != "room.shared.x" {
t.Fatalf("nodeB read wrong room subject: %q", got.Subject)
}
}
+152
View File
@@ -0,0 +1,152 @@
package main
// Integration tests for issue 0011 GAP A: `membershipd user add --store kv`
// adds users to a RUNNING cluster's replicated allowlist via the privileged
// internal connection, instead of the stop-seed-restart procedure the 0011
// deploy required. These exercise the real connectKVStore path (load the
// persisted internal identity from a file, present its nkey, open the KV store,
// write the user) against an embedded enforce node, plus the idempotency and
// error semantics the DoD calls for. Multi-node replication and node-down quorum
// are validated against the live cluster (report 0012).
import (
"encoding/hex"
"errors"
"path/filepath"
"testing"
"time"
cs "fn-registry/functions/cybersecurity"
"github.com/enmanuel/unibus/pkg/busauth"
"github.com/enmanuel/unibus/pkg/client"
"github.com/enmanuel/unibus/pkg/embeddednats"
"github.com/enmanuel/unibus/pkg/membership"
)
// startEnforceKVNode boots a single embedded enforce node whose authenticator
// recognizes internalPubHex as the privileged internal identity, bootstraps the
// KV control-plane store over the in-process internal connection, and publishes
// it into the holder — the exact sequence main.go performs for --store kv. It
// returns the client URL the CLI connects to.
func startEnforceKVNode(t *testing.T, internalID cs.Identity) string {
t.Helper()
holder := &storeHolder{}
auth := busauth.NewNkeyAuthenticatorACLInternal(
holder.IsAuthorized,
busauth.PermissionsFromSubjects(holder.subjectACL),
hex.EncodeToString(internalID.SignPub),
)
ns, err := embeddednats.StartServer(embeddednats.ServerConfig{
StoreDir: t.TempDir(), Host: "127.0.0.1", Port: freePort(t), Auth: auth,
})
if err != nil {
t.Fatalf("start enforce node: %v", err)
}
t.Cleanup(func() { ns.Shutdown(); ns.WaitForShutdown() })
intNC, js, err := connectInternalJS(ns, internalID, true)
if err != nil {
t.Fatalf("bootstrap internal connection: %v", err)
}
t.Cleanup(intNC.Close)
kvStore, err := membership.OpenJetStream(js, membership.JetStreamConfig{Replicas: 1, OpTimeout: 3 * time.Second})
if err != nil {
t.Fatalf("bootstrap KV store: %v", err)
}
holder.set(kvStore)
return ns.ClientURL()
}
// TestUserAddStoreKV_GoldenAndIdempotent is the GAP A golden + edge-1: the CLI
// connection (real connectKVStore, loading the internal identity from a file and
// presenting its nkey) writes a user into the live KV allowlist, the user is
// authorized afterward, and re-adding the same key is an explicit ErrUserExists
// with no corruption (the unchanged row is still authorized).
func TestUserAddStoreKV_GoldenAndIdempotent(t *testing.T) {
idFile := filepath.Join(t.TempDir(), "internal.id")
internalID, err := client.LoadOrCreateIdentity(idFile) // persists 0600
if err != nil {
t.Fatalf("persist internal identity: %v", err)
}
url := startEnforceKVNode(t, internalID)
// Golden: connect as the privileged internal identity (loopback, no TLS) and
// add a new user, exactly as `user add --store kv` does.
kv, err := connectKVStore(url, idFile, "", 1)
if err != nil {
t.Fatalf("connectKVStore (privileged): %v", err)
}
defer kv.Close()
newUser, err := cs.GenerateIdentity()
if err != nil {
t.Fatalf("new user identity: %v", err)
}
pub := hex.EncodeToString(newUser.SignPub)
if err := kv.store.AddUser(pub, "gapcheck_user", membership.RoleMember); err != nil {
t.Fatalf("add user to live KV: %v", err)
}
if !kv.store.IsAuthorized(pub) {
t.Fatalf("user added to KV must be authorized")
}
// Edge 1: re-adding the same key is a clean, non-destructive ErrUserExists.
err = kv.store.AddUser(pub, "gapcheck_user", membership.RoleMember)
if !errors.Is(err, membership.ErrUserExists) {
t.Fatalf("re-add must return ErrUserExists (idempotent), got %v", err)
}
// A different handle/role with the SAME key is also rejected — the row is not
// silently overwritten (no role flip).
if err := kv.store.AddUser(pub, "impostor", membership.RoleAdmin); !errors.Is(err, membership.ErrUserExists) {
t.Fatalf("re-add with a different role must NOT overwrite; want ErrUserExists, got %v", err)
}
u, err := kv.store.GetUser(pub)
if err != nil {
t.Fatalf("get user: %v", err)
}
if u.Handle != "gapcheck_user" || u.Role != membership.RoleMember || u.Status != membership.StatusActive {
t.Fatalf("idempotent re-add corrupted the row: %+v", u)
}
}
// TestUserAddStoreKV_RequiresInternalIdentity: --store kv without a usable
// internal identity file fails loudly (missing file, empty path) rather than
// silently connecting unprivileged.
func TestUserAddStoreKV_RequiresInternalIdentity(t *testing.T) {
if _, err := connectKVStore("nats://127.0.0.1:4250", "", "", 1); err == nil {
t.Fatalf("empty --internal-id-file must be an error")
}
missing := filepath.Join(t.TempDir(), "nope.id")
if _, err := connectKVStore("nats://127.0.0.1:4250", missing, "", 1); err == nil {
t.Fatalf("missing internal identity file must be an error")
}
}
// TestUserAddStoreKV_UnreachableKV is the GAP A error case: pointing --store kv
// at a dead endpoint yields a clear, handled error (no crash, no silent success).
func TestUserAddStoreKV_UnreachableKV(t *testing.T) {
idFile := filepath.Join(t.TempDir(), "internal.id")
if _, err := client.LoadOrCreateIdentity(idFile); err != nil {
t.Fatalf("persist internal identity: %v", err)
}
// A loopback port with nothing listening: connect must fail fast and wrapped.
_, err := connectKVStore("nats://127.0.0.1:1/", idFile, "", 1)
if err == nil {
t.Fatalf("connecting to a dead endpoint must error")
}
}
// TestUserAddStoreKV_RemoteWithoutCARefused: a non-loopback target without --ca
// is refused so the allowlist write never travels in cleartext (audit 0008 N6,
// same guard as migrate-to-kv).
func TestUserAddStoreKV_RemoteWithoutCARefused(t *testing.T) {
idFile := filepath.Join(t.TempDir(), "internal.id")
if _, err := client.LoadOrCreateIdentity(idFile); err != nil {
t.Fatalf("persist internal identity: %v", err)
}
_, err := connectKVStore("nats://203.0.113.1:4250", idFile, "", 1)
if err == nil {
t.Fatalf("remote target without --ca must be refused")
}
}
+75
View File
@@ -0,0 +1,75 @@
package main
import (
"os"
"path/filepath"
"strings"
"testing"
)
// TestResolveClusterPass verifies the secret resolution precedence
// (file > env > flag) that keeps the cluster password out of argv (issue 0006f).
func TestResolveClusterPass(t *testing.T) {
// file wins over env and flag, and is trimmed.
f := filepath.Join(t.TempDir(), "pass")
if err := os.WriteFile(f, []byte("filesecret\n"), 0o600); err != nil {
t.Fatalf("write: %v", err)
}
if got, src, err := resolveClusterPass("flagsecret", f, "envsecret"); err != nil || got != "filesecret" || src != "file" {
t.Fatalf("file precedence: got %q src %q err %v", got, src, err)
}
// env wins over flag when no file.
if got, src, err := resolveClusterPass("flagsecret", "", "envsecret"); err != nil || got != "envsecret" || src != "env" {
t.Fatalf("env precedence: got %q src %q err %v", got, src, err)
}
// flag is the last resort.
if got, src, err := resolveClusterPass("flagsecret", "", ""); err != nil || got != "flagsecret" || src != "flag" {
t.Fatalf("flag fallback: got %q src %q err %v", got, src, err)
}
// none set.
if got, src, err := resolveClusterPass("", "", ""); err != nil || got != "" || src != "none" {
t.Fatalf("none: got %q src %q err %v", got, src, err)
}
// missing file is an error.
if _, _, err := resolveClusterPass("", filepath.Join(t.TempDir(), "nope"), ""); err == nil {
t.Fatalf("missing file must error")
}
}
// TestInjectRouteCreds verifies the secret is injected only into routes that omit
// userinfo, so --routes argv need not carry the password (issue 0006f).
func TestInjectRouteCreds(t *testing.T) {
in := []string{"nats://10.0.0.2:6250", "nats://override:pw@10.0.0.3:6250"}
out, err := injectRouteCreds(in, "user", "secret")
if err != nil {
t.Fatalf("inject: %v", err)
}
if !strings.Contains(out[0], "user:secret@10.0.0.2:6250") {
t.Fatalf("creds not injected into bare route: %q", out[0])
}
if !strings.Contains(out[1], "override:pw@10.0.0.3:6250") {
t.Fatalf("existing userinfo must be preserved: %q", out[1])
}
// empty user is a no-op.
noop, err := injectRouteCreds(in, "", "")
if err != nil || noop[0] != in[0] {
t.Fatalf("empty user must be a no-op: %v %q", err, noop[0])
}
}
// TestIsLoopbackURL guards migrate-to-kv against pushing the allowlist cleartext
// to a remote NATS (issue 0006f, audit 0008 N6).
func TestIsLoopbackURL(t *testing.T) {
loop := []string{"nats://127.0.0.1:4250", "nats://localhost:4250", "nats://[::1]:4250"}
for _, u := range loop {
if !isLoopbackURL(u) {
t.Fatalf("%q should be loopback", u)
}
}
remote := []string{"nats://10.0.0.2:4250", "nats://bus.example.com:4250", "::not-a-url"}
for _, u := range remote {
if isLoopbackURL(u) {
t.Fatalf("%q should NOT be loopback", u)
}
}
}
+229 -12
View File
@@ -7,6 +7,7 @@ package main
import ( import (
"context" "context"
"crypto/tls" "crypto/tls"
"encoding/hex"
"flag" "flag"
"log" "log"
"net/http" "net/http"
@@ -15,10 +16,15 @@ import (
"syscall" "syscall"
"time" "time"
cs "fn-registry/functions/cybersecurity"
"github.com/nats-io/nats.go"
"github.com/nats-io/nats.go/jetstream"
server "github.com/nats-io/nats-server/v2/server" server "github.com/nats-io/nats-server/v2/server"
"github.com/enmanuel/unibus/pkg/blobstore" "github.com/enmanuel/unibus/pkg/blobstore"
"github.com/enmanuel/unibus/pkg/busauth" "github.com/enmanuel/unibus/pkg/busauth"
"github.com/enmanuel/unibus/pkg/client"
"github.com/enmanuel/unibus/pkg/embeddednats" "github.com/enmanuel/unibus/pkg/embeddednats"
"github.com/enmanuel/unibus/pkg/membership" "github.com/enmanuel/unibus/pkg/membership"
) )
@@ -33,6 +39,13 @@ func main() {
runUserCLI(os.Args[2:]) runUserCLI(os.Args[2:])
return return
} }
// `membershipd migrate-to-kv` is the one-time, idempotent SQLite->JetStream KV
// data move for decentralization (issue 0003c). Like the user CLI it runs on
// the host and is dispatched before the server flag set parses os.Args.
if len(os.Args) > 1 && os.Args[1] == "migrate-to-kv" {
runMigrateCLI(os.Args[2:])
return
}
var ( var (
bind = flag.String("bind", "127.0.0.1", "network interface to bind the HTTP API and the embedded NATS to; use 0.0.0.0 to accept LAN/remote peers") bind = flag.String("bind", "127.0.0.1", "network interface to bind the HTTP API and the embedded NATS to; use 0.0.0.0 to accept LAN/remote peers")
@@ -45,6 +58,43 @@ func main() {
busAuth = flag.String("bus-auth", "off", "control-plane auth rollout: off|soft|enforce (feature flag bus-auth)") busAuth = flag.String("bus-auth", "off", "control-plane auth rollout: off|soft|enforce (feature flag bus-auth)")
tlsCert = flag.String("tls-cert", "", "PATH to the NATS server certificate (deploy/tls/server.crt); enables TLS on the embedded data plane") tlsCert = flag.String("tls-cert", "", "PATH to the NATS server certificate (deploy/tls/server.crt); enables TLS on the embedded data plane")
tlsKey = flag.String("tls-key", "", "path to the NATS server private key (deploy/tls/server.key); required with --tls-cert") tlsKey = flag.String("tls-key", "", "path to the NATS server private key (deploy/tls/server.key); required with --tls-cert")
// Cluster (issue 0003a): empty --cluster-name keeps the server standalone.
clusterName = flag.String("cluster-name", "", "NATS cluster name (identical on every node); empty = standalone, no HA")
serverName = flag.String("server-name", "", "unique node name within the cluster (required by JetStream RAFT when clustered)")
clusterPort = flag.Int("cluster-port", 6250, "route listener port for server-to-server cluster traffic")
routesCSV = flag.String("routes", "", "comma-separated nats-route URLs of the OTHER nodes, e.g. nats://user:pass@10.0.0.2:6250")
clusterUser = flag.String("cluster-user", "", "shared route secret username (gates the route listener)")
clusterPass = flag.String("cluster-pass", "", "shared route secret password (argv-visible — prefer --cluster-pass-file or UNIBUS_CLUSTER_PASS)")
// Secret out of argv (issue 0006f, audit 0008 N1-low): a password in
// --cluster-pass / --routes is visible in ps/journald. Prefer a file or the
// UNIBUS_CLUSTER_PASS env var; routes may then omit userinfo and the secret
// is injected from here.
clusterPassFile = flag.String("cluster-pass-file", "", "path to a file holding the cluster route password (preferred over --cluster-pass; keeps the secret out of argv)")
routeTLSCert = flag.String("route-tls-cert", "", "this node's route certificate (CA-signed); enables mutual route TLS with --route-tls-key/--route-tls-ca")
routeTLSKey = flag.String("route-tls-key", "", "this node's route private key")
routeTLSCA = flag.String("route-tls-ca", "", "bus CA that signs every node's route certificate (deploy/tls/ca.crt)")
// Replicated control plane (issue 0006a/c): the JetStream replication factor
// for the shared nonce bucket (and, with --store kv, the control-plane KV).
// 1 for a 1-2 node rollout, 3 for real HA quorum (raise in place with
// `nats stream update --replicas 3` when the third node joins).
kvReplicas = flag.Int("kv-replicas", 1, "JetStream replication factor for the shared nonce/KV buckets (1..3)")
caFile = flag.String("ca", "", "bus CA cert; only used to pin TLS on the internal JetStream connection to an EXTERNAL --nats-url (the embedded server uses an in-process connection that needs no CA)")
// Control-plane store backend (issue 0006c, feature flag decentralized):
// "sqlite" (default) keeps the local single-node SQLite control plane;
// "kv" puts rooms/members/keys/users in replicated JetStream KV so any node
// in the cluster serves the same state.
storeBackend = flag.String("store", "sqlite", "control-plane store backend: sqlite (default, single-node) | kv (replicated JetStream, decentralized)")
// Persisted internal service identity (issue 0011 gaps, GAP A): when set, the
// privileged internal identity used to manage JetStream is LOADED from this
// file (generated and persisted on first start) instead of being a fresh
// ephemeral key each boot. Persisting it is what lets `membershipd user add
// --store kv` write the replicated allowlist of a LIVE cluster: that CLI,
// run over loopback on a node, loads the SAME identity and presents the nkey
// this node's authenticator already grants full permissions. Empty keeps the
// ephemeral-per-process behavior (single-node/dev default, unchanged). The
// file holds a private key: it is written 0600 and belongs next to the node's
// TLS keys (deploy keeps it under secrets/, gitignored).
internalIDFile = flag.String("internal-id-file", "", "path to a persisted internal service identity (JSON); enables `membershipd user add --store kv` against the live cluster. Empty = ephemeral per-process identity (dev default)")
) )
flag.Parse() flag.Parse()
@@ -52,6 +102,17 @@ func main() {
if err != nil { if err != nil {
log.Fatalf("%v", err) log.Fatalf("%v", err)
} }
if *storeBackend != "sqlite" && *storeBackend != "kv" {
log.Fatalf("--store must be \"sqlite\" or \"kv\", got %q", *storeBackend)
}
// Resolve the cluster route secret out of argv (file/env preferred). The
// resolved value (not *clusterPass) is what guards the route layer and is
// injected into peer route URLs below.
clusterPassResolved, passSource, err := resolveClusterPass(*clusterPass, *clusterPassFile, os.Getenv("UNIBUS_CLUSTER_PASS"))
if err != nil {
log.Fatalf("%v", err)
}
// Fail-open guard (audit H2): a non-loopback bind, or any TLS flag, demands // Fail-open guard (audit H2): a non-loopback bind, or any TLS flag, demands
// --bus-auth enforce. This makes an insecure public startup impossible rather // --bus-auth enforce. This makes an insecure public startup impossible rather
@@ -59,18 +120,76 @@ func main() {
if err := validateBootConfig(*bind, authMode, *tlsCert, *tlsKey); err != nil { if err := validateBootConfig(*bind, authMode, *tlsCert, *tlsKey); err != nil {
log.Fatalf("%v", err) log.Fatalf("%v", err)
} }
// Cluster route guard (issue 0003a): a public cluster needs a route secret
// and mutual route TLS, and the route-TLS flags are all-or-nothing.
if err := validateClusterConfig(*clusterName, *bind, *clusterUser, clusterPassResolved, *routeTLSCert, *routeTLSKey, *routeTLSCA, authMode); err != nil {
log.Fatalf("%v", err)
}
log.SetFlags(log.LstdFlags | log.Lmsgprefix) log.SetFlags(log.LstdFlags | log.Lmsgprefix)
log.SetPrefix("[membershipd] ") log.SetPrefix("[membershipd] ")
// Control plane store first: the NATS authenticator consults IsAuthorized, so // A clustered node shares its control plane with peers, so it needs a JetStream
// the store must exist before the embedded server starts. // client to manage the replicated nonce bucket (issue 0006a). --store kv (issue
store, err := membership.Open(*dbPath) // 0006c) also needs JetStream, for the control-plane KV itself. A standalone
if err != nil { // single-node SQLite deployment needs none of this and keeps the in-process,
log.Fatalf("open membership store: %v", err) // in-memory behavior unchanged.
clustered := *clusterName != ""
decentralized := *storeBackend == "kv"
needJS := clustered || decentralized
enforce := authMode == membership.AuthEnforce
// Internal service identity (issue 0006a): when the embedded data plane enforces
// auth, membershipd must still connect to its OWN server to manage JetStream.
// It does so with this ephemeral identity, which the authenticator is built to
// recognize and grant full permissions (it never enters the user allowlist). It
// is only generated when actually needed (JetStream required AND enforce on AND
// the server is embedded), so a standalone or non-enforce node is unchanged.
var internalID cs.Identity
var internalPubHex string
if needJS && enforce && *natsURL == "" {
if *internalIDFile != "" {
// Persisted identity: load it, generating + writing it (0600) on first
// start. A stable internal key is what `user add --store kv` presents to
// add users to a live cluster (GAP A); rotate it by deleting the file and
// restarting.
internalID, err = client.LoadOrCreateIdentity(*internalIDFile)
if err != nil {
log.Fatalf("load internal service identity %q: %v", *internalIDFile, err)
}
log.Printf("internal service identity: persisted (%s)", *internalIDFile)
} else {
internalID, err = cs.GenerateIdentity()
if err != nil {
log.Fatalf("generate internal identity: %v", err)
}
}
internalPubHex = hex.EncodeToString(internalID.SignPub)
} }
defer store.Close()
log.Printf("membership store: %s", *dbPath) // The authenticator consults the store through a holder so it can be built
// before the store exists: with --store kv the JetStream KV store opens only
// after NATS is up (the bootstrap cycle). In the default SQLite path the store
// is opened and set into the holder right here, before the server starts, so
// behavior is identical to the pre-0006c baseline. `store` is the final store
// used by the HTTP server (set below for the KV path).
holder := &storeHolder{}
var store membership.Store
if !decentralized {
store, err = membership.Open(*dbPath)
if err != nil {
log.Fatalf("open membership store: %v", err)
}
holder.set(store)
log.Printf("membership store: sqlite %s", *dbPath)
}
// Close whichever store ends up final (SQLite closes its file; the JetStream KV
// store's Close is a no-op — its NATS connection is closed separately).
defer func() {
if store != nil {
store.Close()
}
}()
blobs, err := blobstore.New(*storeDir) blobs, err := blobstore.New(*storeDir)
if err != nil { if err != nil {
@@ -89,13 +208,53 @@ func main() {
// Bind the embedded NATS to the same interface as the HTTP API so a // Bind the embedded NATS to the same interface as the HTTP API so a
// single --bind flag governs reachability: 127.0.0.1 keeps the whole // single --bind flag governs reachability: 127.0.0.1 keeps the whole
// stack loopback-only; 0.0.0.0 exposes both planes to the LAN. // stack loopback-only; 0.0.0.0 exposes both planes to the LAN.
StoreDir: *natsStore, StoreDir: *natsStore,
Host: *bind, Host: *bind,
Port: *natsPort, Port: *natsPort,
ServerName: *serverName,
}
// Cluster (issue 0003a): with a cluster name, join the route layer for HA.
if *clusterName != "" {
// Inject the resolved secret into peer route URLs that omit userinfo, so
// the password need not appear in --routes argv (issue 0006f).
routes, rerr := injectRouteCreds(splitRoutes(*routesCSV), *clusterUser, clusterPassResolved)
if rerr != nil {
log.Fatalf("%v", rerr)
}
cc := &embeddednats.ClusterConfig{
Name: *clusterName,
Host: *bind,
Port: *clusterPort,
Routes: routes,
Username: *clusterUser,
Password: clusterPassResolved,
}
log.Printf("cluster route secret source: %s", passSource)
if *routeTLSCert != "" {
rtls, err := busauth.RouteTLSConfig(*routeTLSCert, *routeTLSKey, *routeTLSCA)
if err != nil {
log.Fatalf("load route TLS: %v", err)
}
cc.TLS = rtls
log.Printf("cluster route TLS: ON (mutual, CA %s)", *routeTLSCA)
}
cfg.Cluster = cc
log.Printf("cluster: %q node %q, route port %d, %d peer route(s)", *clusterName, *serverName, *clusterPort, len(cc.Routes))
} }
if authMode == membership.AuthEnforce { if authMode == membership.AuthEnforce {
cfg.Auth = busauth.NewNkeyAuthenticator(store.IsAuthorized) // Per-subject data-plane ACL (audit H4 / N4 residual): the authenticator
log.Printf("NATS nkey authentication: ON (enforce)") // authorizes by the bus allowlist AND confines each connection to the
// subjects of the rooms it belongs to (plus client-infra subjects). This
// closes the wildcard metadata leak where a registered non-member could
// Subscribe(">") and harvest every room's subject and JetStream activity.
// NATS freezes permissions at connect time, so a peer that joins a room
// after connecting must client.RefreshSession to gain that room's subject.
cfg.Auth = busauth.NewNkeyAuthenticatorACLInternal(
holder.IsAuthorized,
busauth.PermissionsFromSubjects(holder.subjectACL),
internalPubHex,
)
log.Printf("NATS nkey authentication: ON (enforce, per-subject ACL)")
} }
if *tlsCert != "" || *tlsKey != "" { if *tlsCert != "" || *tlsKey != "" {
if *tlsCert == "" || *tlsKey == "" { if *tlsCert == "" || *tlsKey == "" {
@@ -118,6 +277,38 @@ func main() {
log.Printf("using external NATS: %s", natsClientURL) log.Printf("using external NATS: %s", natsClientURL)
} }
// JetStream client + decentralized store (issue 0006a/c). needJS is set for a
// clustered node (shared nonce bucket) and for --store kv (the KV control
// plane). Open the privileged JetStream client first (in-process for the
// embedded server, a plain client for external NATS), then — for --store kv —
// open the replicated KV store and publish it into the holder so the
// authenticator and HTTP server serve from it. The privileged connection is the
// only client that can connect in this window (the holder still denies everyone
// else; the internal identity bypasses the store).
var js jetstream.JetStream
if needJS {
var internalNC *nats.Conn
if *natsURL == "" {
internalNC, js, err = connectInternalJS(ns, internalID, enforce)
} else {
internalNC, js, err = connectExternalJS(natsClientURL, *caFile)
}
if err != nil {
log.Fatalf("internal JetStream connection (required by --cluster-name/--store kv): %v", err)
}
defer internalNC.Close()
if decentralized {
kvStore, err := membership.OpenJetStream(js, membership.JetStreamConfig{Replicas: *kvReplicas})
if err != nil {
log.Fatalf("open decentralized control-plane KV store: %v", err)
}
store = kvStore
holder.set(store)
log.Printf("membership store: jetstream KV (replicas=%d)", *kvReplicas)
}
}
srv := membership.NewServer(store, blobs, authMode) srv := membership.NewServer(store, blobs, authMode)
// On a public (non-loopback) bind, disable cleartext rooms: the embedded NATS // On a public (non-loopback) bind, disable cleartext rooms: the embedded NATS
// has no per-subject ACL, so cleartext content would be readable by any // has no per-subject ACL, so cleartext content would be readable by any
@@ -127,6 +318,32 @@ func main() {
srv.RequireEncryptedRooms = true srv.RequireEncryptedRooms = true
log.Printf("cleartext rooms: DISABLED (public bind requires end-to-end encryption)") log.Printf("cleartext rooms: DISABLED (public bind requires end-to-end encryption)")
} }
// Publish this node's posture on /healthz so a monitor (or a peer) can detect a
// cluster member not running the homogeneous enforce+ACL+TLS posture (audit
// 0008 N1). enforce implies the per-subject ACL in this binary (they are wired
// together above).
srv.Posture = membership.Posture{
Enforce: enforce,
ACL: enforce,
TLS: *tlsCert != "",
Cluster: clustered,
Store: *storeBackend,
}
// Replicated anti-replay (issue 0006a, audit 0008 N3): a clustered node MUST
// share its nonce store across the cluster, or a request accepted on one node
// can be replayed to another. HARD requirement: if the bucket cannot be created
// the node refuses to start rather than run with a per-process cache that leaves
// the replay hole open.
if needJS {
if err := wireReplicatedNonces(srv, js, clustered, *kvReplicas); err != nil {
log.Fatalf("%v", err)
}
if clustered {
log.Printf("anti-replay: replicated nonce bucket \"KV_UNIBUS_nonces\" (replicas=%d) — cluster-safe", *kvReplicas)
}
}
log.Printf("control-plane auth: %s", authMode) log.Printf("control-plane auth: %s", authMode)
addr := *bind + ":" + *httpPort addr := *bind + ":" + *httpPort
httpSrv := &http.Server{ httpSrv := &http.Server{
+95
View File
@@ -0,0 +1,95 @@
package main
import (
"flag"
"fmt"
"os"
"time"
"github.com/enmanuel/unibus/pkg/busauth"
"github.com/enmanuel/unibus/pkg/membership"
"github.com/nats-io/nats.go"
"github.com/nats-io/nats.go/jetstream"
)
// runMigrateCLI implements `membershipd migrate-to-kv`, the idempotent move of
// the control-plane state from the local SQLite database into replicated
// JetStream KV (issue 0003c). It backs up the SQLite file first (VACUUM INTO),
// then connects to the target NATS and copies every room/member/key/user into
// the KV buckets. Re-running it converges to the same state.
//
// It runs on the bus host (no auth on the control-plane side), connecting to the
// cluster's NATS; --ca pins TLS when the data plane is secured.
func runMigrateCLI(args []string) {
fs := flag.NewFlagSet("migrate-to-kv", flag.ExitOnError)
dbPath := fs.String("db", defaultDBPath, "SQLite database path to migrate FROM")
natsURL := fs.String("nats-url", "", "NATS url of the cluster to migrate INTO (required)")
ca := fs.String("ca", "", "CA cert to pin TLS on the NATS connection (optional)")
replicas := fs.Int("replicas", 1, "KV replication factor (1 for a 1-2 node rollout, 3 for HA quorum)")
noBackup := fs.Bool("no-backup", false, "skip the SQLite backup before migrating (NOT recommended)")
_ = fs.Parse(args)
if *natsURL == "" {
fmt.Fprintln(os.Stderr, "membershipd migrate-to-kv: --nats-url is required (the cluster to write the KV buckets into)")
os.Exit(2)
}
// Confidentiality guard (issue 0006f, audit 0008 N6): the migration writes the
// allowlist (handles, roles, signing pubkeys) into the KV. Against a REMOTE NATS
// without TLS that metadata would travel in cleartext, so a remote target MUST
// be TLS-pinned with --ca. A loopback target is local-only and exempt.
if !isLoopbackURL(*natsURL) && *ca == "" {
fmt.Fprintf(os.Stderr, "membershipd migrate-to-kv: refusing to migrate to remote %q without --ca; the allowlist (handles/roles/sign pubs) would travel in cleartext — pin TLS with --ca, or run against a loopback nats-url\n", *natsURL)
os.Exit(2)
}
// Back up the SQLite database first so a botched migration can be undone.
var backupPath string
if !*noBackup {
bak, err := membership.BackupSQLite(*dbPath)
if err != nil {
fmt.Fprintf(os.Stderr, "membershipd migrate-to-kv: backup failed: %v\n", err)
os.Exit(1)
}
backupPath = bak
fmt.Printf("backed up %s -> %s\n", *dbPath, backupPath)
}
// Connect to the target NATS (optionally TLS-pinned to the bus CA).
natsOpts := []nats.Option{nats.Name("unibus-migrate")}
if *ca != "" {
tlsCfg, err := busauth.LoadCATLSConfig(*ca)
if err != nil {
fmt.Fprintf(os.Stderr, "membershipd migrate-to-kv: load CA: %v\n", err)
os.Exit(1)
}
natsOpts = append(natsOpts, nats.Secure(tlsCfg))
}
nc, err := nats.Connect(*natsURL, natsOpts...)
if err != nil {
fmt.Fprintf(os.Stderr, "membershipd migrate-to-kv: connect %q: %v\n", *natsURL, err)
os.Exit(1)
}
defer nc.Close()
js, err := jetstream.New(nc)
if err != nil {
fmt.Fprintf(os.Stderr, "membershipd migrate-to-kv: jetstream: %v\n", err)
os.Exit(1)
}
report, err := membership.MigrateSQLiteToKV(*dbPath, js, membership.JetStreamConfig{
Replicas: *replicas,
OpTimeout: 30 * time.Second,
})
if err != nil {
fmt.Fprintf(os.Stderr, "membershipd migrate-to-kv: %v\n", err)
os.Exit(1)
}
report.BackupPath = backupPath
fmt.Printf("migrated to KV (replicas=%d): %d rooms, %d members, %d keys, %d users\n",
*replicas, report.Rooms, report.Members, report.Keys, report.Users)
if backupPath != "" {
fmt.Printf("rollback: restore %s if needed\n", backupPath)
}
}
+60
View File
@@ -0,0 +1,60 @@
package main
import (
"fmt"
"sync"
"github.com/enmanuel/unibus/pkg/membership"
)
// storeHolder is a concurrency-safe slot for the control-plane store, used to
// break the decentralized bootstrap cycle (issue 0006c): the NATS authenticator
// must be built BEFORE the embedded server starts, but the JetStream KV store can
// only be opened AFTER NATS is up (it needs a JetStream client). The authenticator
// therefore consults the holder instead of a concrete store.
//
// Fail-closed by construction: until the store is set, IsAuthorized denies and
// SubjectACL errors, so any client connecting in the startup window is rejected.
// The only connection expected in that window is membershipd's own internal
// service identity, which the authenticator recognizes by key and lets through
// without consulting the store at all. In the SQLite (default) path the store is
// set before StartServer, so the window does not exist and behavior is identical
// to the pre-0006c baseline.
type storeHolder struct {
mu sync.RWMutex
s membership.Store
}
func (h *storeHolder) set(s membership.Store) {
h.mu.Lock()
h.s = s
h.mu.Unlock()
}
func (h *storeHolder) get() membership.Store {
h.mu.RLock()
defer h.mu.RUnlock()
return h.s
}
// IsAuthorized reports whether signPubHex is an active bus user, denying while the
// store is not yet set (fail closed). It is the predicate the nkey authenticator
// uses for every connecting client.
func (h *storeHolder) IsAuthorized(signPubHex string) bool {
s := h.get()
if s == nil {
return false
}
return s.IsAuthorized(signPubHex)
}
// subjectACL derives the per-subject permissions for signPubHex via the live
// store, erroring (so the caller fails closed and denies the connection) while the
// store is not yet set.
func (h *storeHolder) subjectACL(signPubHex string) ([]string, error) {
s := h.get()
if s == nil {
return nil, fmt.Errorf("control-plane store not ready")
}
return membership.SubjectACLFor(s)(signPubHex)
}
+51
View File
@@ -0,0 +1,51 @@
package main
import (
"encoding/hex"
"path/filepath"
"testing"
cs "fn-registry/functions/cybersecurity"
"github.com/enmanuel/unibus/pkg/membership"
)
// TestStoreHolderFailClosed: an empty holder denies everything (the bootstrap
// window before the store is set), and starts serving once a store is published.
func TestStoreHolderFailClosed(t *testing.T) {
h := &storeHolder{}
// Empty: deny + error (fail closed).
if h.IsAuthorized("anything") {
t.Fatalf("empty holder must deny IsAuthorized")
}
if _, err := h.subjectACL("anything"); err == nil {
t.Fatalf("empty holder must error from subjectACL (fail closed)")
}
// After set: serves from the real store.
store, err := membership.Open(filepath.Join(t.TempDir(), "unibus.db"))
if err != nil {
t.Fatalf("store: %v", err)
}
t.Cleanup(func() { store.Close() })
id, err := cs.GenerateIdentity()
if err != nil {
t.Fatalf("identity: %v", err)
}
pub := hex.EncodeToString(id.SignPub)
if err := store.AddUser(pub, "alice", membership.RoleMember); err != nil {
t.Fatalf("add user: %v", err)
}
h.set(store)
if !h.IsAuthorized(pub) {
t.Fatalf("after set, an active user must be authorized")
}
if _, err := h.subjectACL(pub); err != nil {
t.Fatalf("after set, subjectACL must succeed: %v", err)
}
if h.IsAuthorized("deadbeef") {
t.Fatalf("a non-user must not be authorized")
}
}
+84 -17
View File
@@ -1,7 +1,7 @@
package main package main
import ( import (
"encoding/hex" "errors"
"flag" "flag"
"fmt" "fmt"
"os" "os"
@@ -50,13 +50,26 @@ commands:
list List all registered users list List all registered users
revoke Revoke a user (denies access on both planes immediately) revoke Revoke a user (denies access on both planes immediately)
store backends (--store):
sqlite local SQLite database (default; seeds the first admin offline)
kv the RUNNING cluster's replicated JetStream KV allowlist, via the
privileged internal connection — add users with the cluster live,
no stop-seed-restart needed (run over loopback/SSH on a node)
examples: examples:
membershipd user add --handle alice --sign-pub <64-hex> --role admin membershipd user add --handle alice --sign-pub <64-hex> --role admin
membershipd user list membershipd user add --store kv --handle bob --sign-pub <64-hex> --role member
membershipd user list --store kv
membershipd user revoke <64-hex> membershipd user revoke <64-hex>
common flags: common flags:
--db <path> SQLite database path (default ./local_files/unibus.db) --db <path> SQLite database path (--store sqlite; default ./local_files/unibus.db)
--store kv flags (defaults assume an on-node invocation):
--nats-url <url> cluster NATS (default nats://127.0.0.1:4250)
--internal-id-file <path> persisted internal service identity (default /opt/unibus/secrets/internal.id)
--ca <path> CA cert pinning the data-plane TLS (default /opt/unibus/tls/ca.crt)
--kv-replicas <n> KV replication factor, match the cluster (default 3)
`) `)
} }
@@ -65,7 +78,7 @@ const defaultDBPath = "./local_files/unibus.db"
// openStore opens the membership store at path, exiting on failure. Migrations // openStore opens the membership store at path, exiting on failure. Migrations
// (including 002_users.sql) are applied by membership.Open, so a fresh database // (including 002_users.sql) are applied by membership.Open, so a fresh database
// gets the users table on first use of the CLI. // gets the users table on first use of the CLI.
func openStore(path string) *membership.Store { func openStore(path string) membership.Store {
store, err := membership.Open(path) store, err := membership.Open(path)
if err != nil { if err != nil {
fmt.Fprintf(os.Stderr, "membershipd user: open store %q: %v\n", path, err) fmt.Fprintf(os.Stderr, "membershipd user: open store %q: %v\n", path, err)
@@ -76,16 +89,56 @@ func openStore(path string) *membership.Store {
// validateSignPubHex ensures the key is exactly a 32-byte Ed25519 public key in // validateSignPubHex ensures the key is exactly a 32-byte Ed25519 public key in
// hex (64 hex chars). Catching this here turns a silent "authorized nobody" into // hex (64 hex chars). Catching this here turns a silent "authorized nobody" into
// an explicit error at seed time. // an explicit error at seed time. It delegates to membership.ValidateSignPubHex
// so the CLI and the HTTP user-management handlers share one rule.
func validateSignPubHex(signPub string) error { func validateSignPubHex(signPub string) error {
b, err := hex.DecodeString(signPub) return membership.ValidateSignPubHex(signPub)
if err != nil { }
return fmt.Errorf("sign-pub is not valid hex: %w", err)
// kvFlags holds the connection flags shared by the --store kv path of the user
// subcommands. registerKVFlags wires them onto a flag set so add and list expose
// an identical interface.
type kvFlags struct {
store *string
natsURL *string
internalID *string
ca *string
replicas *int
}
func registerKVFlags(fs *flag.FlagSet) kvFlags {
return kvFlags{
store: fs.String("store", "sqlite", "user store backend: sqlite (local DB) | kv (the live cluster's replicated allowlist)"),
natsURL: fs.String("nats-url", defaultClusterNatsURL, "cluster NATS url for --store kv"),
internalID: fs.String("internal-id-file", defaultInternalIDFile, "persisted internal service identity for --store kv"),
ca: fs.String("ca", defaultClusterCAFile, "CA cert pinning TLS on the --store kv NATS connection"),
replicas: fs.Int("kv-replicas", 3, "KV replication factor for --store kv (match the cluster)"),
} }
if len(b) != 32 { }
return fmt.Errorf("sign-pub must be a 32-byte Ed25519 public key (64 hex chars), got %d bytes", len(b))
// resolveStore returns the membership store for the chosen backend plus a cleanup
// func. For --store kv it opens the privileged connection to the live cluster; for
// sqlite it opens the local file. It exits the process with a clear message on any
// failure (a dead NATS, a missing identity file), so a broken --store kv add fails
// loudly instead of silently — Error case of the GAP A DoD. The returned *kvConn
// is non-nil only for the kv backend (so the caller can report replication).
func resolveStore(cmd string, kf kvFlags, dbPath string) (membership.Store, *kvConn, func()) {
switch *kf.store {
case "sqlite":
store := openStore(dbPath)
return store, nil, func() { store.Close() }
case "kv":
kv, err := connectKVStore(*kf.natsURL, *kf.internalID, *kf.ca, *kf.replicas)
if err != nil {
fmt.Fprintf(os.Stderr, "membershipd %s: --store kv: %v\n", cmd, err)
os.Exit(1)
}
return kv.store, kv, kv.Close
default:
fmt.Fprintf(os.Stderr, "membershipd %s: --store must be \"sqlite\" or \"kv\", got %q\n", cmd, *kf.store)
os.Exit(2)
return nil, nil, func() {}
} }
return nil
} }
func userAdd(args []string) { func userAdd(args []string) {
@@ -94,6 +147,7 @@ func userAdd(args []string) {
signPub := fs.String("sign-pub", "", "Ed25519 signing public key in hex (required)") signPub := fs.String("sign-pub", "", "Ed25519 signing public key in hex (required)")
role := fs.String("role", membership.RoleMember, "role: admin or member") role := fs.String("role", membership.RoleMember, "role: admin or member")
dbPath := fs.String("db", defaultDBPath, "SQLite database path") dbPath := fs.String("db", defaultDBPath, "SQLite database path")
kf := registerKVFlags(fs)
_ = fs.Parse(args) _ = fs.Parse(args)
if *handle == "" || *signPub == "" { if *handle == "" || *signPub == "" {
@@ -105,23 +159,35 @@ func userAdd(args []string) {
os.Exit(2) os.Exit(2)
} }
store := openStore(*dbPath) store, kv, closeStore := resolveStore("user add", kf, *dbPath)
defer store.Close() defer closeStore()
if err := store.AddUser(*signPub, *handle, *role); err != nil { if err := store.AddUser(*signPub, *handle, *role); err != nil {
if errors.Is(err, membership.ErrUserExists) {
// Idempotency contract (GAP A): re-adding the same key is an EXPLICIT,
// non-destructive error — the existing row is left untouched (no silent
// upsert that could flip a role or clobber status, which would corrupt the
// allowlist). To replace a user, `user revoke <key>` then add again.
fmt.Fprintf(os.Stderr, "membershipd user add: user %s already registered (unchanged); revoke it first to replace\n", *signPub)
os.Exit(1)
}
fmt.Fprintf(os.Stderr, "membershipd user add: %v\n", err) fmt.Fprintf(os.Stderr, "membershipd user add: %v\n", err)
os.Exit(1) os.Exit(1)
} }
fmt.Printf("added user %q (%s) role=%s\n", *handle, *signPub, *role) fmt.Printf("added user %q (%s) role=%s\n", *handle, *signPub, *role)
if kv != nil {
reportKVReplication(kv.js)
}
} }
func userList(args []string) { func userList(args []string) {
fs := flag.NewFlagSet("user list", flag.ExitOnError) fs := flag.NewFlagSet("user list", flag.ExitOnError)
dbPath := fs.String("db", defaultDBPath, "SQLite database path") dbPath := fs.String("db", defaultDBPath, "SQLite database path")
kf := registerKVFlags(fs)
_ = fs.Parse(args) _ = fs.Parse(args)
store := openStore(*dbPath) store, _, closeStore := resolveStore("user list", kf, *dbPath)
defer store.Close() defer closeStore()
users, err := store.ListUsers() users, err := store.ListUsers()
if err != nil { if err != nil {
@@ -143,6 +209,7 @@ func userList(args []string) {
func userRevoke(args []string) { func userRevoke(args []string) {
fs := flag.NewFlagSet("user revoke", flag.ExitOnError) fs := flag.NewFlagSet("user revoke", flag.ExitOnError)
dbPath := fs.String("db", defaultDBPath, "SQLite database path") dbPath := fs.String("db", defaultDBPath, "SQLite database path")
kf := registerKVFlags(fs)
// Go's flag package stops at the first non-flag argument, so `revoke <key> // Go's flag package stops at the first non-flag argument, so `revoke <key>
// --db path` would otherwise leave --db unparsed. Pull a leading positional // --db path` would otherwise leave --db unparsed. Pull a leading positional
@@ -167,8 +234,8 @@ func userRevoke(args []string) {
os.Exit(2) os.Exit(2)
} }
store := openStore(*dbPath) store, _, closeStore := resolveStore("user revoke", kf, *dbPath)
defer store.Close() defer closeStore()
if err := store.RevokeUser(signPub); err != nil { if err := store.RevokeUser(signPub); err != nil {
fmt.Fprintf(os.Stderr, "membershipd user revoke: %v\n", err) fmt.Fprintf(os.Stderr, "membershipd user revoke: %v\n", err)
+151
View File
@@ -0,0 +1,151 @@
package main
import (
"context"
"fmt"
"os"
"time"
"github.com/enmanuel/unibus/pkg/busauth"
"github.com/enmanuel/unibus/pkg/client"
"github.com/enmanuel/unibus/pkg/membership"
"github.com/nats-io/nats.go"
"github.com/nats-io/nats.go/jetstream"
)
// users_kv.go is the `--store kv` half of the user administration CLI (issue 0011
// gaps, GAP A): adding and listing bus users directly against the RUNNING
// cluster's replicated JetStream KV allowlist, with no need to stop the cluster,
// seed a standalone node, and restart (the procedure the 0011 deploy required).
//
// The mechanism is the cluster's own privileged internal connection. Under
// enforce every bus user is confined by the per-subject ACL to the JetStream API
// of its own rooms, so no ordinary identity may touch the control-plane buckets
// (KV_UNIBUS_*). The ONLY identity the authenticator grants full JetStream
// permissions is membershipd's internal service identity. By persisting that
// identity to a file (membershipd --internal-id-file) the same key becomes
// available to this CLI, which presents it as its NATS nkey and is therefore
// recognized as the privileged internal client and allowed to read/write the KV.
//
// Intended invocation is over loopback on a cluster node (SSH): the data-plane
// TLS certificate's SAN covers 127.0.0.1/localhost and the internal identity file
// lives 0600 next to the node's TLS keys. Using the file requires root on the
// node, which already implies full control of that node — so co-locating it adds
// no practical exposure beyond what the TLS server key and cluster password
// already represent.
// defaultClusterNatsURL is the node-local NATS listener. The CLI is meant to run
// on a cluster node over SSH, talking to that node's own embedded server.
const defaultClusterNatsURL = "nats://127.0.0.1:4250"
// Deploy-default paths for the privileged identity and the data-plane CA, so an
// on-node invocation needs only --handle/--sign-pub/--role. Override for other
// layouts.
const (
defaultInternalIDFile = "/opt/unibus/secrets/internal.id"
defaultClusterCAFile = "/opt/unibus/tls/ca.crt"
)
// kvConn bundles the privileged NATS connection to a live cluster and the
// KV-backed control-plane store opened over it. Close releases both.
type kvConn struct {
nc *nats.Conn
js jetstream.JetStream
store membership.Store
}
func (k *kvConn) Close() {
if k == nil {
return
}
if k.store != nil {
_ = k.store.Close()
}
if k.nc != nil {
k.nc.Close()
}
}
// connectKVStore opens the privileged internal connection to the cluster's NATS
// and the JetStream KV control-plane store on top of it. internalIDFile is the
// membershipd-persisted internal service identity whose nkey the authenticator
// grants full permissions; caPath pins the data-plane TLS (empty only for a
// non-TLS dev cluster). A non-loopback target without --ca is refused, mirroring
// migrate-to-kv (audit 0008 N6): the allowlist write must not travel in cleartext.
func connectKVStore(natsURL, internalIDFile, caPath string, replicas int) (*kvConn, error) {
if internalIDFile == "" {
return nil, fmt.Errorf("--internal-id-file is required for --store kv (the privileged identity membershipd persists with --internal-id-file)")
}
// Confidentiality guard: a remote NATS without TLS would expose the allowlist
// (handles/roles/sign-pubs) and the privileged nkey handshake in cleartext.
if !isLoopbackURL(natsURL) && caPath == "" {
return nil, fmt.Errorf("refusing to connect to remote %q without --ca: the allowlist write would travel in cleartext — pin TLS with --ca, or run over a loopback --nats-url on a node", natsURL)
}
id, err := client.LoadIdentity(internalIDFile)
if err != nil {
return nil, fmt.Errorf("load internal identity: %w", err)
}
nkeyPub, nkeySign, err := busauth.ClientNkey(id.SignPriv)
if err != nil {
return nil, fmt.Errorf("derive nkey from internal identity: %w", err)
}
opts := []nats.Option{
nats.Name("membershipd-user-cli"),
nats.Nkey(nkeyPub, nkeySign),
}
if caPath != "" {
tlsCfg, err := busauth.LoadCATLSConfig(caPath)
if err != nil {
return nil, fmt.Errorf("load CA %q: %w", caPath, err)
}
opts = append(opts, nats.Secure(tlsCfg))
}
nc, err := nats.Connect(natsURL, opts...)
if err != nil {
return nil, fmt.Errorf("connect cluster NATS %q: %w", natsURL, err)
}
js, err := jetstream.New(nc)
if err != nil {
nc.Close()
return nil, fmt.Errorf("jetstream: %w", err)
}
store, err := membership.OpenJetStream(js, membership.JetStreamConfig{Replicas: replicas})
if err != nil {
nc.Close()
return nil, fmt.Errorf("open KV control-plane store: %w", err)
}
return &kvConn{nc: nc, js: js, store: store}, nil
}
// reportKVReplication prints the replication status of the allowlist bucket
// stream (KV_UNIBUS_users) right after a write, so the operator sees the add
// landed on a quorum and replicated to the followers — executable evidence that
// the live-cluster add is HA, not single-node. Best-effort: a read failure is a
// note, not an error (the write itself already succeeded).
func reportKVReplication(js jetstream.JetStream) {
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
st, err := js.Stream(ctx, "KV_UNIBUS_users")
if err != nil {
fmt.Fprintf(os.Stderr, "note: could not read KV_UNIBUS_users stream info: %v\n", err)
return
}
info, err := st.Info(ctx)
if err != nil {
fmt.Fprintf(os.Stderr, "note: could not read KV_UNIBUS_users stream info: %v\n", err)
return
}
if info.Cluster == nil {
fmt.Printf("KV_UNIBUS_users: standalone (R1, no cluster replication); msgs=%d\n", info.State.Msgs)
return
}
current := 0
for _, r := range info.Cluster.Replicas {
if r.Current {
current++
}
}
fmt.Printf("KV_UNIBUS_users: leader=%s followers_current=%d/%d msgs=%d\n",
info.Cluster.Leader, current, len(info.Cluster.Replicas), info.State.Msgs)
}
+40
View File
@@ -0,0 +1,40 @@
package main
import (
"fmt"
"github.com/enmanuel/unibus/pkg/membership"
"github.com/nats-io/nats.go/jetstream"
)
// wireReplicatedNonces applies the cluster anti-replay policy to srv. It is the
// single piece of wiring the binary uses to decide whether a node must share its
// nonce store, extracted so a regression test exercises the EXACT decision the
// running binary makes (issue 0006a, audit 0008 N3).
//
// Policy:
// - A clustered node (clustered == true) MUST use the shared JetStream KV nonce
// bucket. Every node sees the same bucket, so a request accepted on one node
// cannot be replayed to another whose per-process cache never saw the nonce.
// A missing JetStream context, or a failure to create the bucket, is a FATAL
// configuration error returned to the caller — a clustered node running with a
// per-process nonce cache is precisely the replay hole the audit flagged, so
// it must refuse to start rather than serve insecurely.
// - A standalone node (clustered == false) keeps the in-memory cache that
// NewServer installed: there is no second node to replay to, so the shared
// bucket would only add a JetStream dependency for no security gain.
//
// replicas is the nonce bucket's replication factor (R1..R3). Returns nil when no
// action is required (standalone).
func wireReplicatedNonces(srv *membership.Server, js jetstream.JetStream, clustered bool, replicas int) error {
if !clustered {
return nil // standalone: the in-memory nonce cache is sufficient and safe
}
if js == nil {
return fmt.Errorf("clustered node requires JetStream for the shared nonce bucket, but none is available")
}
if err := srv.UseReplicatedNonces(js, replicas); err != nil {
return fmt.Errorf("replicated nonces: %w", err)
}
return nil
}
+7
View File
@@ -47,6 +47,13 @@ func main() {
if err != nil { if err != nil {
log.Fatalf("create room: %v", err) log.Fatalf("create room: %v", err)
} }
// Membership-change contract (issue 0006e): the bus freezes per-subject
// permissions at connect time, and this room did not exist then. Refresh the
// session so the new room's subject becomes publishable under enforce+ACL. On
// an unsecured/dev bus this is a harmless reconnect.
if err := c.RefreshSession(); err != nil {
log.Fatalf("refresh session after create room: %v", err)
}
log.Printf("room %q -> %s (subject %s, cleartext)", *roomSub, roomID, *roomSub) log.Printf("room %q -> %s (subject %s, cleartext)", *roomSub, roomID, *roomSub)
stop := make(chan os.Signal, 1) stop := make(chan os.Signal, 1)
+31
View File
@@ -65,3 +65,34 @@ curl -fsS http://<host-lan-ip>:8470/healthz
- To run against an external NATS instead of the embedded one, append - To run against an external NATS instead of the embedded one, append
`--nats-url nats://<host>:4222` to `ExecStart` and re-run `daemon-reload` + `--nats-url nats://<host>:4222` to `ExecStart` and re-run `daemon-reload` +
`restart`. `restart`.
## Clustering (HA) — see `deploy/cluster/`
The single-node service above is secure on its own. Running unibus as a
multi-node **cluster** has extra hardening rules (issues 0006a0006f); the full
runbook and the generated material live in `deploy/cluster/`. Key points an
operator must know:
- **Homogeneous posture (0006d).** Every node MUST run `--bus-auth enforce` (the
binary refuses to join a cluster otherwise) and present mutual route TLS on a
public bind. `/healthz` publishes each node's `posture` so a monitor can flag a
node that is not `enforce`+`acl`+`tls`.
- **Separate route CA (0006f).** The cluster route layer authenticates *nodes*,
not bus users — sign the route certs with a **dedicated cluster CA**
(`--route-tls-ca`), NOT the client data-plane CA (`--tls-cert`'s CA). Keeping
the two trust roots separate means a client cert can never be presented to the
route port. `deploy/cluster/generate-cluster-certs.sh` builds this CA.
- **Secret out of argv (0006f).** Pass the route password via
`--cluster-pass-file` or the `UNIBUS_CLUSTER_PASS` env var, NOT `--cluster-pass`
or a `nats://user:pass@host` in `--routes` (both are visible in `ps`/journald).
When the secret comes from a file/env, list peers as bare `--routes
nats://<host>:6250` and the binary injects the credentials.
- **`migrate-to-kv` confidentiality (0006f).** The migration writes the allowlist
(handles/roles/sign pubs) into KV. Run it only against a **loopback** nats-url,
or pin TLS with `--ca` for a remote target — otherwise that metadata travels in
cleartext. The binary refuses a remote target without `--ca`.
- **R1 is NOT HA (0006a/N3-DoS).** With `--kv-replicas 1` the control plane
(including the nonce bucket) is a single point of failure: if the node owning
the stream dies, every authenticated request fails closed (auth DoS). Real HA
needs **R3** (quorum 2/3): raise replicas in place with `nats stream update
--replicas 3` once the third node has joined. Do not advertise R1 as HA.
+7
View File
@@ -0,0 +1,7 @@
# Generated TLS material and secrets — NEVER commit (audit 0008: keys/secret).
out/
build/
secrets/
*.key
*.srl
cluster-ca.crt
+285
View File
@@ -0,0 +1,285 @@
# unibus cluster — 3-node deploy runbook (issue 0006g)
This directory holds the material to bring up unibus as a **3-node cluster**
(`magnus` + `homer` + `datardos`) for real HA: with **R3** replication the control
plane (rooms/members/keys/users on JetStream KV + the anti-replay nonce bucket)
survives the loss of any one node (quorum 2/3).
> **Status: this cluster is DEPLOYED in production** (magnus + homer + datardos,
> R3, enforce+ACL+TLS) — see report 0011. The runbook below was authored before any
> VPS existed and has since been **corrected against the real deploy** (report 0012):
> the start ordering, the R1→R3 reality, and the live user-add path were all wrong
> or missing. Steps that change a remote host are marked **HUMAN**; `deploy-cluster.sh`
> still defaults to a dry run.
## Files
| File | What it is |
|---|---|
| `nodes.env` | Topology: cluster name, ports, and the per-node rows (name, ssh host, public IP, WG IP). **HUMAN fills the placeholders.** |
| `generate-cluster-certs.sh` | Mints a **separate cluster route CA** + a route cert per node, and a data-plane server cert per node signed by the **client CA** (`../tls/ca.*`). |
| `membershipd-cluster.service` | One systemd unit, parameterized per node by `/opt/unibus/cluster.env`. enforce + per-subject ACL + TLS + `--store kv`, `Restart=always`. |
| `deploy-cluster.sh` | Cross-builds the linux binary, generates each node's `cluster.env`, and (with `--yes`) rsyncs everything + installs the unit. Staggered start is manual. |
Generated keys/secrets (`out/`, `build/`, `secrets/`) are **gitignored** — they are
secret and never leave the operator's trusted machine except over the secure
rsync channel.
## Topology (as deployed, report 0011)
| Node | SSH | Public IP | Role |
|---|---|---|---|
| magnus | `magnus` (root) | `135.125.201.30` | node — **= organic-machine.com = `om`**, the critical host (caddy + gitea + registry-api + monitoring); the bus runs alongside, untouched |
| homer | `homer` (ubuntu+sudo) | `141.94.69.66` | node |
| datardos | `dd` (ubuntu+sudo) | `51.91.100.142` | node |
`ROUTE_NETWORK=public`, **not `wg`**: there is no WireGuard mesh between the three
nodes (homer and datardos do not even have the `wg` binary; om's only WG peers are
the operator's PCs). The server-to-server routes therefore travel over the public
IPs, protected by the **separate cluster route CA** (mutual route TLS) — a client
data-plane cert can never be presented to the route port. The client data plane and
the HTTP control plane are also reached over the public IPs. There is no fixed
"seed" node: with R3 the three are peers (see "Bring up" for why a lone node cannot
self-serve).
## Prerequisites (HUMAN, once)
1. **Fill `nodes.env`** — replace every `<PLACEHOLDER>` (magnus public IP, all WG
IPs). The scripts refuse to run while any remain.
2. **Client CA exists**`../tls/ca.crt` + `../tls/ca.key`. If not, run
`../tls/generate-certs.sh` on the CA host (om) first. The cluster reuses this CA
for the data plane so existing clients keep trusting the bus.
3. **Mint cluster TLS**:
```bash
./generate-cluster-certs.sh # writes out/<name>/ ; --force to rotate the cluster CA
```
4. **Create the route secret** (out of argv, shared by all nodes):
```bash
mkdir -p secrets && openssl rand -hex 32 > secrets/cluster.pass
```
5. **SSH** to each node's SSH host as `root` works (`ssh magnus true`, `ssh dd true`, ...).
## Stage the nodes
```bash
./deploy-cluster.sh # DRY RUN — prints the full plan, touches nothing
./deploy-cluster.sh --yes # HUMAN: actually rsync + install the unit on all 3 nodes
```
This cross-builds `membershipd` (linux/amd64, `CGO_ENABLED=0`), writes each node's
`cluster.env` (its `NODE_NAME` and the `--routes` to the OTHER two nodes), and
ships the binary, the node's TLS material, the secret, the env file and the unit.
It does **not** start anything.
## Seed the first admin into the KV (HUMAN — loopback bootstrap)
The empty KV control plane has no users, and under `enforce` no external tool can
write the FIRST admin over NATS (it would need to be an admin already — a
chicken-and-egg). The `user` CLI also writes only to a local SQLite file, not the
KV. So the first admin is seeded on the seed node through a **loopback, no-auth
bootstrap** that populates the same JetStream store the cluster unit then reuses:
```bash
ssh root@magnus 'bash -s' <<'SEED'
set -euo pipefail
cd /opt/unibus
# a) Put the first admin into a local SQLite seed file.
./membershipd user add --db ./seed.db --handle root --sign-pub <ADMIN_SIGN_PUB_HEX> --role admin
# b) Bring up a TEMPORARY loopback, no-auth, single-node KV server on the cluster's
# own JetStream store dir (not exposed; bus-auth off is allowed on 127.0.0.1).
./membershipd --store kv --bus-auth off --bind 127.0.0.1 \
--nats-store ./local_files/jetstream --db ./seed.db >/tmp/seed-boot.log 2>&1 &
BOOT=$!; sleep 2
# c) Migrate the admin from SQLite into the replicated KV (loopback — no --ca needed).
./membershipd migrate-to-kv --db ./seed.db --nats-url nats://127.0.0.1:4250 --replicas 1
# d) Stop the bootstrap server. The KV buckets persist in ./local_files/jetstream.
kill "$BOOT"; wait "$BOOT" 2>/dev/null || true
rm -f ./seed.db
SEED
```
> The KV written here lives in `./local_files/jetstream`, which the cluster unit
> reuses (`--nats-store` default), so the admin is present when the enforce cluster
> starts. This loopback bootstrap is needed ONLY for the very first admin (the
> chicken-and-egg). **Every user after that is added with the cluster live** — no
> stop-seed-restart — via `user add --store kv` (see "Add users to the live
> cluster" below, report 0012).
## Bring up (HUMAN)
> **CORRECTION (report 0012).** The original instruction — "start magnus alone and
> verify healthz, then add the others" — is **WRONG and will look like a hung
> deploy.** A 3-node JetStream cluster forms a RAFT meta-group that needs a quorum
> (2 of 3) to elect a leader. A single started node has no quorum, so its JetStream
> meta never becomes current: `--store kv` blocks creating the KV buckets and
> **`/healthz` never returns ok** until a second node joins. Waiting for magnus to
> "go green" before starting the others therefore deadlocks the rollout.
Start the nodes so a quorum forms. On a **clean cluster** the simplest correct
procedure is to start all three close together and let the meta-group converge:
```bash
# Start all three (order does not matter); each blocks on the others until a
# 2/3 quorum elects a JetStream meta leader, then the KV buckets are created.
for h in magnus homer datardos; do ssh "$h" 'sudo systemctl enable --now membershipd-cluster'; done
# Only NOW does healthz return ok — once the meta-group has a leader (give it
# ~10-30s on a cold start). Poll, do not assume the first node is broken.
for h in magnus homer datardos; do
echo "== $h =="; ssh "$h" 'curl -fsS https://127.0.0.1:8470/healthz --cacert /opt/unibus/tls/ca.crt || echo "(not ready yet — needs quorum)"'
done
```
A **staggered** start also works, but only because `membershipd`'s KV open RETRIES
the bucket creation for a 120s bootstrap budget (issue 0006g, fix #3): the first
node sits in that retry loop — NOT serving healthz — until the second node makes a
quorum, then both converge and the third catches up. Either way, a lone node never
self-serves; do not gate the next node's start on the previous one's healthz.
> A cold multi-node start only converges because of **three cold-start fixes**
> (report 0011): route pooling off (`PoolSize=-1`), `NoAdvertise=true` (Docker
> bridge IPs not gossiped), and the KV-open retry loop above. Without them the
> meta-group re-elects leaders forever and bucket creation hangs. If a fresh
> cluster will not form, confirm the running binary contains these fixes before
> touching config.
## Promote an existing single-node (SQLite) deployment (HUMAN, optional)
Instead of seeding fresh, you can migrate an existing single-node `unibus.db` into
the KV — **loopback only** (the allowlist would otherwise travel cleartext; the
command refuses a remote target without `--ca`). Use the same loopback-bootstrap
shape as the seed step (temporary `--bus-auth off` server on 127.0.0.1, then
`migrate-to-kv --db /opt/unibus/local_files/unibus.db`).
## Verify
```bash
# Posture on every node — all must be enforce+acl+tls+cluster, store=kv.
for h in magnus homer datardos; do
echo "== $h =="
ssh root@$h 'curl -fsS https://127.0.0.1:8470/healthz --cacert /opt/unibus/tls/ca.crt'
done
# Cluster + JetStream meta-group health (needs the `nats` CLI on a node):
ssh root@magnus 'nats --server nats://127.0.0.1:4250 server report jetstream'
ssh root@magnus 'nats --server nats://127.0.0.1:4250 server list' # 3 servers, routes up
```
A healthy cluster shows 3 routed servers and a JetStream meta-group with a leader.
## Add users to the live cluster (HUMAN — `user add --store kv`)
With the cluster up, add (and revoke) bus users **without stopping anything**,
directly against the replicated KV allowlist. This replaces the stop-seed-restart
procedure the original runbook implied for every user beyond the first admin.
The mechanism is the cluster's own **privileged internal connection**: under
`enforce` every bus user is confined by the per-subject ACL to its own rooms, so no
ordinary identity may write the control-plane buckets. The only identity the
authenticator grants full JetStream permissions is `membershipd`'s internal service
identity. The unit persists that identity to `${INTERNAL_ID_FILE}`
(`/opt/unibus/secrets/internal.id`, 0600) via `--internal-id-file`, so the same key
is available to the CLI. Run the CLI **on a node, over loopback** (the data-plane
TLS cert SAN covers `127.0.0.1`); reading the identity file requires root on that
node, which already implies full control of it, so this adds no practical exposure.
```bash
# Add a member to the live cluster's replicated allowlist (run on any node).
ssh root@magnus 'sudo /opt/unibus/membershipd user add --store kv \
--handle alice --role member --sign-pub <64-hex-ed25519-pub>'
# -> added user "alice" (...) role=member
# -> KV_UNIBUS_users: leader=<node> followers_current=2/2 msgs=N (replicated, HA)
# List / revoke against the same live KV:
ssh root@magnus 'sudo /opt/unibus/membershipd user list --store kv'
ssh root@magnus 'sudo /opt/unibus/membershipd user revoke --store kv <64-hex-ed25519-pub>'
```
Defaults assume an on-node invocation (`--nats-url nats://127.0.0.1:4250`,
`--internal-id-file /opt/unibus/secrets/internal.id`, `--ca /opt/unibus/tls/ca.crt`,
`--kv-replicas 3`). Semantics:
- **Idempotent / non-destructive**: re-adding the same key is an explicit
`already registered` error (exit 1), never a silent overwrite — a re-add cannot
flip a member to admin. To replace a user, `revoke` then add.
- **HA**: the write commits through the JetStream quorum, so it succeeds even with
one node down (2/3); the printed `followers_current` shows replication.
- **No hard delete**: `revoke` flips status to `revoked` (denied on both planes,
auditable); the KV has no row deletion, matching the SQLite store.
> **Rollout note (report 0012):** the live verification deployed this binary +
> `--internal-id-file` to **datardos only** (the non-critical node). magnus and
> homer still run the 0011 binary. To make the capability available (and the unit)
> on all three — recommended, the posture is identical so there is no urgency — roll
> the new binary with backups, one node at a time, verifying healthz between each:
> ```bash
> for h in homer magnus; do
> ssh "$h" 'sudo cp -a /opt/unibus/membershipd /opt/unibus/membershipd.bak' # backup
> scp build/membershipd "$h:/tmp/m" && ssh "$h" 'sudo install -o ubuntu -g ubuntu -m0775 /tmp/m /opt/unibus/membershipd'
> # add INTERNAL_ID_FILE=/opt/unibus/secrets/internal.id to /opt/unibus/cluster.env
> # add `--internal-id-file ${INTERNAL_ID_FILE} \` to the unit before `--store kv`
> ssh "$h" 'sudo systemctl daemon-reload && sudo systemctl restart membershipd-cluster'
> ssh "$h" 'curl -fsS https://127.0.0.1:8470/healthz --cacert /opt/unibus/tls/ca.crt' # green before next
> done
> ```
> (`deploy-cluster.sh` + the unit template already emit `INTERNAL_ID_FILE` and the
> flag, so a fresh `./deploy-cluster.sh --yes` is correct for all three.)
## Replication: go straight to R3 (HUMAN — real HA)
> **CORRECTION (report 0012).** The original "start at R1, then scale to R3" plan
> assumed R1 is a usable interim state. **It is not, in this cluster.** At R1 all six
> control-plane buckets (`KV_UNIBUS_users/rooms/members/room_keys/rooms_by_member`
> + `KV_UNIBUS_nonces`) live on a SINGLE node — a hard **SPOF for authentication**:
> if that node dies, the nonce/KV control plane is unreachable and EVERY
> authenticated request fails closed (auth DoS). Worse, the cold multi-node start
> only converges at all because of the three cold-start fixes (see "Bring up"); the
> real deploy never ran a healthy R1 and **jumped straight to R3 once the cluster
> formed.** Treat R1 as a transient artifact of bucket creation, not a milestone.
The deployed config already sets `KV_REPLICAS=3` in `nodes.env`. If buckets were
created at R1 (e.g. only one node was up when `--store kv` first opened them), raise
every control-plane stream to R3 IN PLACE (no data loss) once all three nodes are
routed:
```bash
for s in KV_UNIBUS_users KV_UNIBUS_rooms KV_UNIBUS_members KV_UNIBUS_room_keys \
KV_UNIBUS_rooms_by_member KV_UNIBUS_nonces; do
ssh root@magnus "nats --server nats://127.0.0.1:4250 stream update $s --replicas 3 -f"
done
# (also OBJ_UNIBUS_blobs if the object store is in use)
```
After this each bucket shows `followers_current=2/2` (quorum 2/3). The
`user add --store kv` command prints that figure for `KV_UNIBUS_users` on every add,
which is a cheap live HA check.
## Chaos test (HUMAN — requires the 3 live VPS)
Validate quorum tolerance after R3:
```bash
# Kill one node; the cluster keeps serving (quorum 2/3). On ubuntu nodes use sudo.
ssh dd 'sudo systemctl stop membershipd-cluster'
# -> clients fail over (multiple seed URLs); reads/writes still succeed.
ssh dd 'sudo systemctl start membershipd-cluster' # rejoins, catches up
# Kill two nodes; quorum is LOST — the control plane should fail CLOSED (deny),
# never fail open. Verify a request is rejected, not silently served.
```
> **Validated (report 0012).** The 0011 chaos run checked only the control plane
> (healthz + meta/stream-leader failover + KV readable with 2/3). Report 0012 added
> the missing data-plane proofs against the live cluster: a real authenticated
> client (`cmd/clientcheck`, operator identity, nkey+TLS) creating an E2E room and
> publishing/subscribing — including a node stopped mid-stream, where the client
> failed over to a survivor and kept receiving with zero loss (quorum 2/3) — and
> `user add --store kv` committing with one node (the KV leader) down. The kill-2/3
> fail-closed case remains a documented manual step.
## Rollback
`membershipd` does not delete data. To revert a node to standalone SQLite, stop
the unit and start it without `--store kv`/`--cluster-name`; the KV buckets remain
for a later retry. To rotate the cluster CA, re-run `generate-cluster-certs.sh
--force` and re-stage (every node must get the new `cluster-ca.crt` together).
+130
View File
@@ -0,0 +1,130 @@
#!/usr/bin/env bash
#
# deploy-cluster.sh — cross-build membershipd and stage it onto the three cluster
# nodes (issue 0006g). DEFAULT IS DRY-RUN: it prints the plan and touches nothing.
# Pass --yes to actually rsync + run remote commands. Steps that a HUMAN must run
# (or confirm) are marked "HUMAN:".
#
# Prerequisites (HUMAN, once):
# 1. Fill nodes.env (no <PLACEHOLDER> left).
# 2. ./generate-cluster-certs.sh (mints out/<name>/ TLS material)
# 3. Create the route secret locally: mkdir -p secrets && openssl rand -hex 32 > secrets/cluster.pass
# (secrets/ is gitignored; it is rsynced to each node as cluster.pass)
# 4. SSH access to every node's SSH_HOST with sudo-less root (SSH_USER=root).
#
# What it does per node (with --yes):
# - rsync the membershipd binary, the node's TLS material, the unit, the
# generated cluster.env and the route secret into REMOTE_DIR.
# - install + daemon-reload the systemd unit.
# Start is STAGGERED and left to the human (see README): start the seed node,
# seed the admin, then start the rest.
set -euo pipefail
DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
cd "$DIR"
# shellcheck source=/dev/null
source ./nodes.env
APPLY=0
[[ "${1:-}" == "--yes" ]] && APPLY=1
if grep -q '<[A-Z_]\+>' nodes.env; then
echo "ERROR: nodes.env still has <PLACEHOLDER> values — fill them in first." >&2
exit 2
fi
SECRET_FILE="secrets/cluster.pass"
if [[ ! -f "$SECRET_FILE" ]]; then
echo "ERROR: $SECRET_FILE missing. HUMAN: mkdir -p secrets && openssl rand -hex 32 > $SECRET_FILE" >&2
exit 2
fi
run() {
# Echo every action; only execute it under --yes.
echo " + $*"
if [[ $APPLY -eq 1 ]]; then
"$@"
fi
}
echo "==> [1/3] cross-build membershipd (linux/amd64, CGO disabled)"
run env CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -o build/membershipd ../../cmd/membershipd
# Build the comma-separated route list for a node = the OTHER nodes' addresses on
# the chosen network, with NO userinfo (the secret is injected by membershipd from
# the file). Echoes nothing; prints the value.
routes_for() {
local self="$1" out=""
local row name _ssh pub wg addr
for row in "${CLUSTER_NODES[@]}"; do
read -r name _ssh pub wg <<<"$row"
[[ "$name" == "$self" ]] && continue
if [[ "$ROUTE_NETWORK" == "public" ]]; then addr="$pub"; else addr="$wg"; fi
out+="nats://${addr}:${NATS_ROUTE_PORT},"
done
echo "${out%,}"
}
echo "==> [2/3] stage each node (REMOTE_DIR=$REMOTE_DIR)"
for row in "${CLUSTER_NODES[@]}"; do
read -r name ssh _pub _wg <<<"$row"
target="${SSH_USER}@${ssh}"
nodedir="out/${name}"
if [[ ! -d "$nodedir" ]]; then
echo "ERROR: $nodedir missing — run ./generate-cluster-certs.sh first." >&2
exit 2
fi
routes="$(routes_for "$name")"
echo "-- node ${name} (ssh ${ssh}) routes=${routes}"
# Generate this node's cluster.env locally, then ship it.
envfile="build/cluster-${name}.env"
mkdir -p build
cat > "$envfile" <<EOF
NODE_NAME=${name}
CLUSTER_NAME=${CLUSTER_NAME}
CLUSTER_USER=${CLUSTER_USER}
KV_REPLICAS=${KV_REPLICAS}
HTTP_PORT=${HTTP_PORT}
NATS_CLIENT_PORT=${NATS_CLIENT_PORT}
NATS_ROUTE_PORT=${NATS_ROUTE_PORT}
ROUTES=${routes}
CLUSTER_PASS_FILE=${REMOTE_DIR}/secrets/cluster.pass
TLS_CERT=${REMOTE_DIR}/tls/server-${name}.crt
TLS_KEY=${REMOTE_DIR}/tls/server-${name}.key
ROUTE_TLS_CERT=${REMOTE_DIR}/tls/route-${name}.crt
ROUTE_TLS_KEY=${REMOTE_DIR}/tls/route-${name}.key
ROUTE_TLS_CA=${REMOTE_DIR}/tls/cluster-ca.crt
INTERNAL_ID_FILE=${REMOTE_DIR}/secrets/internal.id
EOF
run ssh "$target" "mkdir -p ${REMOTE_DIR}/tls ${REMOTE_DIR}/secrets"
run rsync -az build/membershipd "${target}:${REMOTE_DIR}/membershipd"
run rsync -az "${nodedir}/" "${target}:${REMOTE_DIR}/tls/"
run rsync -az "$SECRET_FILE" "${target}:${REMOTE_DIR}/secrets/cluster.pass"
run rsync -az "$envfile" "${target}:${REMOTE_DIR}/cluster.env"
run rsync -az membershipd-cluster.service "${target}:/etc/systemd/system/membershipd-cluster.service"
run ssh "$target" "chmod 600 ${REMOTE_DIR}/secrets/cluster.pass ${REMOTE_DIR}/tls/*.key && systemctl daemon-reload"
done
echo "==> [3/3] staged."
if [[ $APPLY -eq 0 ]]; then
echo " DRY-RUN: nothing was sent. Re-run with --yes to apply."
fi
cat <<'NEXT'
HUMAN — bring up (see README "Bring up" — a LONE node has no quorum and never
serves healthz, so do NOT gate the next node on the previous one going green):
1. Seed the FIRST admin into the KV via the loopback bootstrap (README
"Seed the first admin"); this is needed only for the chicken-and-egg admin.
2. Start all three so a 2/3 quorum forms (order does not matter); healthz
turns ok only once the meta-group elects a leader (~10-30s cold):
for h in magnus homer datardos; do ssh "$h" 'sudo systemctl enable --now membershipd-cluster'; done
3. Verify posture + quorum (README "Verify").
4. Ensure R3 on every control-plane stream (README "Replication: go straight to
R3"); R1 is a SPOF, not a milestone.
5. Add further users with the cluster LIVE — no restart — via
`membershipd user add --store kv` (README "Add users to the live cluster").
NEXT
+120
View File
@@ -0,0 +1,120 @@
#!/usr/bin/env bash
#
# generate-cluster-certs.sh — mint the TLS material for a unibus 3-node cluster
# (issue 0006g). Run ONCE on a trusted machine (e.g. om, which custodies the bus
# CA); distribute the per-node output to each node over a secure channel. This
# script touches NO remote host.
#
# It produces two trust roots, kept SEPARATE on purpose (audit 0008 N1-low):
#
# 1. The CLUSTER route CA (cluster-ca.crt/key, generated here): signs each
# node's ROUTE certificate. The route layer authenticates NODES, not bus
# users, so it must NOT share the client data-plane CA — a client cert can
# then never be presented to the route port.
# 2. The CLIENT data-plane CA (../tls/ca.crt/key, the one clients pin): signs
# each node's DATA-PLANE server certificate. Reused, not regenerated, so
# existing clients keep trusting the bus.
#
# Per node it emits, under out/<name>/:
# route-<name>.crt/key route cert (cluster CA), EKU server+clientAuth
# (each node is BOTH server and dialer to its peers)
# server-<name>.crt/key data-plane cert (client CA), EKU serverAuth
# cluster-ca.crt the route CA cert (for --route-tls-ca)
# ca.crt the client CA cert (for clients / control-plane TLS)
#
# SANs per node = its public IP + its WireGuard IP + its hostname + localhost.
#
# Key material: EC P-256 (Go crypto/tls + nats-server friendly), matching
# ../tls/generate-certs.sh.
set -euo pipefail
DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
cd "$DIR"
# shellcheck source=/dev/null
source ./nodes.env
# Refuse to run while any placeholder remains (HUMAN must fill nodes.env first).
if grep -q '<[A-Z_]\+>' nodes.env; then
echo "ERROR: nodes.env still has <PLACEHOLDER> values — fill them in first." >&2
grep -n '<[A-Z_]\+>' nodes.env >&2
exit 2
fi
CLIENT_CA_CRT="../tls/ca.crt"
CLIENT_CA_KEY="../tls/ca.key"
if [[ ! -f "$CLIENT_CA_CRT" || ! -f "$CLIENT_CA_KEY" ]]; then
echo "ERROR: client data-plane CA not found at ../tls/ca.{crt,key}." >&2
echo " Run ../tls/generate-certs.sh first (it mints the client CA)." >&2
exit 2
fi
DAYS_CA=3650
DAYS_CRT=825
force=0
[[ "${1:-}" == "--force" ]] && force=1
# --- cluster route CA (separate trust root) ---
if [[ ! -f cluster-ca.crt || ! -f cluster-ca.key || $force -eq 1 ]]; then
echo "==> generating cluster route CA (separate from the client CA)"
openssl ecparam -name prime256v1 -genkey -noout -out cluster-ca.key
chmod 600 cluster-ca.key
openssl req -x509 -new -key cluster-ca.key -sha256 -days "$DAYS_CA" \
-subj "/CN=unibus-cluster-ca" -out cluster-ca.crt
else
echo "==> reusing existing cluster route CA (pass --force to regenerate)"
fi
# mint <out_key> <out_crt> <subject_cn> <san> <eku> <ca_crt> <ca_key>
mint_cert() {
local out_key="$1" out_crt="$2" cn="$3" san="$4" eku="$5" ca_crt="$6" ca_key="$7"
local csr ext
csr="$(mktemp)"
ext="$(mktemp)"
openssl ecparam -name prime256v1 -genkey -noout -out "$out_key"
chmod 600 "$out_key"
openssl req -new -key "$out_key" -subj "/CN=${cn}" -out "$csr"
cat > "$ext" <<EOF
subjectAltName=${san}
extendedKeyUsage=${eku}
keyUsage=digitalSignature,keyEncipherment
EOF
openssl x509 -req -in "$csr" -CA "$ca_crt" -CAkey "$ca_key" -CAcreateserial \
-sha256 -days "$DAYS_CRT" -extfile "$ext" -out "$out_crt"
rm -f "$csr" "$ext"
}
for row in "${CLUSTER_NODES[@]}"; do
read -r name _ssh pub wg <<<"$row"
echo "==> node ${name}: SAN IP:${pub}, IP:${wg}, DNS:${name}, localhost, 127.0.0.1"
nodedir="out/${name}"
mkdir -p "$nodedir"
san="IP:${pub},IP:${wg},DNS:${name},DNS:localhost,IP:127.0.0.1"
# Route cert: signed by the cluster CA; server+client auth (mutual routes).
mint_cert "${nodedir}/route-${name}.key" "${nodedir}/route-${name}.crt" \
"unibus-route-${name}" "$san" "serverAuth,clientAuth" \
cluster-ca.crt cluster-ca.key
# Data-plane server cert: signed by the client CA; serverAuth only.
mint_cert "${nodedir}/server-${name}.key" "${nodedir}/server-${name}.crt" \
"unibus-${name}" "$san" "serverAuth" \
"$CLIENT_CA_CRT" "$CLIENT_CA_KEY"
# Co-locate the two CA certs each node needs.
cp cluster-ca.crt "${nodedir}/cluster-ca.crt"
cp "$CLIENT_CA_CRT" "${nodedir}/ca.crt"
done
rm -f cluster-ca.srl ../tls/ca.srl 2>/dev/null || true
echo
echo "==> done. Per-node material under out/<name>/ (KEYS ARE SECRET — never git):"
for row in "${CLUSTER_NODES[@]}"; do
read -r name _rest <<<"$row"
echo " out/${name}/ (route-${name}.*, server-${name}.*, cluster-ca.crt, ca.crt)"
done
echo
echo "verify a SAN with:"
echo " openssl x509 -in out/<name>/server-<name>.crt -noout -text | grep -A1 'Subject Alternative Name'"
@@ -0,0 +1,46 @@
[Unit]
# unibus membershipd — cluster node (issue 0006g).
#
# One unit, parameterized per node by /opt/unibus/cluster.env (generated by
# deploy-cluster.sh): NODE_NAME, ROUTES and the cert paths differ per node, the
# rest of the posture (enforce + per-subject ACL + TLS + --store kv) is identical
# on every node, which is the homogeneous posture a secure cluster requires
# (audit 0008 N1).
Description=unibus membershipd (cluster node)
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
WorkingDirectory=/opt/unibus
EnvironmentFile=/opt/unibus/cluster.env
# The route password comes from a FILE referenced by ${CLUSTER_PASS_FILE}, never
# from argv (audit 0008 N1-low). The peer --routes carry no userinfo; membershipd
# injects the credentials from the file/user.
ExecStart=/opt/unibus/membershipd \
--bind 0.0.0.0 \
--bus-auth enforce \
--http-port ${HTTP_PORT} \
--nats-port ${NATS_CLIENT_PORT} \
--tls-cert ${TLS_CERT} \
--tls-key ${TLS_KEY} \
--cluster-name ${CLUSTER_NAME} \
--server-name ${NODE_NAME} \
--cluster-port ${NATS_ROUTE_PORT} \
--routes ${ROUTES} \
--cluster-user ${CLUSTER_USER} \
--cluster-pass-file ${CLUSTER_PASS_FILE} \
--route-tls-cert ${ROUTE_TLS_CERT} \
--route-tls-key ${ROUTE_TLS_KEY} \
--route-tls-ca ${ROUTE_TLS_CA} \
--internal-id-file ${INTERNAL_ID_FILE} \
--store kv \
--kv-replicas ${KV_REPLICAS}
# Restart=always (NOT on-failure): a clean SIGTERM exits success, and on-failure
# would then NOT restart, leaving the node silently dead (see function_tags.md).
Restart=always
RestartSec=2
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
+57
View File
@@ -0,0 +1,57 @@
# Cluster topology for the unibus 3-node deployment (issue 0006g).
#
# This file is SOURCED by generate-cluster-certs.sh and deploy-cluster.sh.
#
# HUMAN: fill in every placeholder with the real value before running the
# scripts. The public IPs known at authoring time are pre-filled; the WireGuard
# mesh IPs and magnus's public IP must be supplied. The scripts refuse to run
# while any unfilled placeholder remains.
# Cluster identity (must be identical on every node).
CLUSTER_NAME="unibus"
# Route-secret username; the password is NOT here — it lives in a file (see
# CLUSTER_PASS_FILE in deploy-cluster.sh) so it never lands in argv or git.
CLUSTER_USER="unibus-cluster"
# KV/nonce replication factor. START AT 1 for the initial 1->3 rollout, then raise
# to 3 IN PLACE (see README "Scale to R3") once all three nodes have joined. Only
# set this to 3 here after the third node is up and you re-run the KV update.
KV_REPLICAS=3
# Ports (same on every node; the route port is server-to-server only).
NATS_CLIENT_PORT=4250
NATS_ROUTE_PORT=6250
HTTP_PORT=8470
# Remote install layout and SSH login user.
REMOTE_DIR="/opt/unibus"
SSH_USER="root"
# Which address family the inter-node routes use. "wg" builds --routes from the
# WireGuard mesh IPs (private server-to-server links, preferred); "public" uses
# the public IPs. The route layer is always mutual-TLS regardless.
#
# DEPLOY DECISION (2026-06-07): set to "public". No WireGuard mesh exists between
# the three cluster nodes — homer and datardos do not even have the `wg` binary
# installed, and om's only WG peers are the operator's personal PCs, not the VPS.
# Rather than stand up a fresh mesh blindly, the routes go over the public IPs,
# still protected by the separate cluster route CA (mutual-TLS). On magnus (the
# only node with ufw active) the route port 6250 is restricted to the homer and
# datardos public IPs; homer/datardos run ufw inactive (Docker hosts) and rely on
# the route mutual-TLS for 6250.
ROUTE_NETWORK="public"
# One row per node: NAME SSH_HOST PUBLIC_IP WG_IP
# NAME -> --server-name and the per-node cert filenames (unique).
# SSH_HOST -> the `ssh ALIAS` alias (see ~/.ssh/config).
# PUBLIC_IP -> public address; goes in the cert SANs (client-facing data plane).
# WG_IP -> WireGuard mesh address; cert SAN + route target when ROUTE_NETWORK=wg.
# NOTE: with ROUTE_NETWORK=public and no WireGuard mesh, the WG_IP column is set to
# each node's public IP so the cert SAN covers the address actually used by the
# public routes and no unfilled placeholder remains (scripts refuse to run otherwise).
# magnus == organic-machine.com == om (135.125.201.30); SSH alias `magnus` enters as root.
CLUSTER_NODES=(
"magnus magnus 135.125.201.30 135.125.201.30"
"homer homer 141.94.69.66 141.94.69.66"
"datardos dd 51.91.100.142 51.91.100.142"
)
+7
View File
@@ -14,6 +14,13 @@
"description": "TLS on the NATS data plane using the project's self-signed CA (deploy/tls/). Server opts in via membershipd --tls-cert/--tls-key; clients pin ca.crt via client.Connect(caPath).", "description": "TLS on the NATS data plane using the project's self-signed CA (deploy/tls/). Server opts in via membershipd --tls-cert/--tls-key; clients pin ca.crt via client.Connect(caPath).",
"added": "2026-06-07", "added": "2026-06-07",
"enabled_at": "2026-06-07" "enabled_at": "2026-06-07"
},
"decentralized": {
"enabled": false,
"issue": "0003",
"description": "Control-plane state on replicated JetStream KV instead of local SQLite (branch-by-abstraction membership.Store: sqliteStore default, jetstreamStore opt-in). The route cluster (0003a) and the KV store (0003b) shipped behind this flag; the membershipd boot wiring that selects the store is COMPLETE since issue 0006c and is realized at runtime with the server flag --store kv|sqlite (default sqlite). The internal-identity bootstrap (0006a) lets membershipd open the KV store on its own embedded NATS under enforce. Per-deploy opt-in: a node joins the decentralized control plane by starting with --store kv (and --cluster-name for HA). OFF (--store sqlite) keeps the single-node SQLite control plane unchanged.",
"added": "2026-06-07",
"enabled_at": null
} }
} }
} }
+132
View File
@@ -0,0 +1,132 @@
---
issue: 0005
title: Hardening 2 — CVEs, spoof por firma omitida, DoS por concurrencia, TLS forzado (re-auditoría)
status: done
created: 2026-06-07
completed: 2026-06-07
domain: security
scope: unibus (go.mod, pkg/client, pkg/membership/server.go, cmd/membershipd/config.go, pkg/embeddednats, pkg/blobstore)
depends_on: 0001, 0004 (cierra los hallazgos NUEVOS de la re-auditoría sobre lo entregado)
blocks: 0001f (deploy público) y 0003f (deploy descentralizado)
source: projects/message_bus/reports/0006-2026-06-07-unibus-security-reaudit.md
---
# Objetivo
La re-auditoría red-team (report 0006) confirmó que el hardening 0004 cerró H1H7/H12,
pero encontró **hallazgos nuevos** que mantienen el veredicto en **"NO exponer público
aún"**. Este issue los cierra. La re-auditoría se hizo sobre el commit `618f6b6`
(pre-0003); algunos hallazgos pueden haber cambiado con 0003 — **cada fase debe primero
verificar si el hallazgo sigue vivo en el master actual** (post-0003, v0.6.0) antes de
arreglarlo.
Estado verificado al crear este issue (master post-0003):
- **N1 vivo**: `go.mod` sigue en `nats-server v2.10.22` y `go 1.25.0`.
- **N3 vivo**: `pkg/client/client.go:802` tiene `if info.Policy.SignMsgs && f.Sig != nil` (el patrón vulnerable exacto).
- **H4**: 0003 añadió `pkg/membership/acl.go` — hay que evaluar si cierra el wildcard `Subscribe(">")` o si falta la capa de NATS Permissions.
- N2, N4: presumiblemente vivos (0003 no los tocó); verificar.
# Fases (TBD, ramas `issue/0005x-*`)
## 0005a — N1 (Alto): CVEs en dependencias
**Hallazgo:** `govulncheck ./...` → 16 vulnerabilidades alcanzables: 14 en
`github.com/nats-io/nats-server/v2@v2.10.22` (servidor embebido, expuesto público en el
deploy decidido) + 2 en la stdlib de Go (`net/textproto` GO-2026-5039, `crypto/x509`
GO-2026-5037).
**Fix:**
- `go get github.com/nats-io/nats-server/v2@v2.11.15` (o superior que cubra las 14).
- Subir la toolchain a `go1.26.4` (cubre las 2 de stdlib); actualizar la directiva `go`
en `go.mod` si procede.
- Re-correr `govulncheck ./...` hasta **0 affected**.
- **Nota:** este es un cambio de `go.mod`/`go.sum` justificado por CVE; documentarlo en el
commit. Verificar que el bump de nats-server no rompe el cluster/JetStream de 0003
(correr toda la suite, incluido el e2e multi-nodo).
**DoD:** `govulncheck ./...` → "No vulnerabilities found" (o solo no-alcanzables); suite
completa verde tras el bump.
## 0005b — N3 (Alto): spoof por firma omitida en rooms firmadas
**Hallazgo:** `pkg/client/client.go::processFrame` verifica la firma **solo si el frame la
trae**: `if info.Policy.SignMsgs && f.Sig != nil { verify }`. Un atacante con acceso al
data plane publica un frame con `Sig==nil` y `Sender` forjado → el receptor lo acepta como
auténtico en una room que EXIGE firma.
**Fix:** en una room `SignMsgs`, un frame sin firma debe **dropearse**:
```go
if info.Policy.SignMsgs {
if f.Sig == nil { return } // exige firma; sin ella, descarta
if !verify(...) { return }
}
```
**DoD:** portar `TestReaudit_SigNilSpoof` → ahora el frame `Sig==nil` con `Sender` forjado
en una room `SignMsgs` se **descarta** (no se entrega al handler). Golden (frame firmado
válido se entrega) + edge (room sin SignMsgs no se ve afectada) + error (Sig==nil en
SignMsgs → drop).
## 0005c — N2 (Medio-Alto): DoS por concurrencia
**Hallazgo:** el límite por-request (16 MiB) + rate-limit per-IP NO acotan la memoria
agregada. 40 subidas de 16 MiB simultáneas (= el burst per-IP) → 1.42 GB RSS. Multi-IP
escala sin techo.
**Fix (elegir y documentar):**
- Límite global de conexiones concurrentes y/o de bytes-en-vuelo (semáforo con cota de
memoria total), y/o
- Stream del blob a disco en vez de `io.ReadAll` en RAM (encaja con la cuota/GC del issue
0002), y/o
- Bajar `maxBlobBytes` y separar mejor el límite de control (1 MiB) del de blobs.
**DoD:** test que lanza N subidas concurrentes al techo y verifica que el RSS agregado
queda **acotado** (mide `/proc/self/status`, cota declarada) en vez de crecer linealmente
con N. Golden (concurrencia normal pasa) + edge (en la cota) + error (exceso → 429/503 sin
OOM).
## 0005d — N4 (Medio): forzar TLS del control plane en bind público
**Hallazgo:** el guard `validateBootConfig` cierra "público sin enforce" y "TLS sin
enforce", pero **permite** público + enforce **sin** `--tls-cert` → el control plane sirve
HTTP plano públicamente (reaparece H5: metadata en claro).
**Fix:** el guard debe exigir `--tls-cert`/`--tls-key` cuando el bind no es loopback.
`public + enforce + sin TLS``log.Fatal`.
**DoD:** portar `TestGap_PublicEnforceNoTLS` → ahora `validateBootConfig("0.0.0.0",
enforce, "", "")` **rechaza**. Golden (público+enforce+TLS OK) + edge (loopback sin TLS
sigue OK para dev) + error (público sin TLS aborta).
## 0005e — H4 (Medio, residual): evaluar y completar la ACL por subject
**Contexto:** 0003 añadió `pkg/membership/acl.go`. Primero **evaluar** con el ataque del
report 0006 (`TestReaudit_H4_WildcardMetadataLeak`: un registrado no-miembro con
`Subscribe(">")` raw capta subjects + advisories de JetStream de rooms ajenas) si ese
acl.go ya lo cierra.
- Si lo cierra → portar el test como regresión y documentar.
- Si NO (probable: la ACL real necesita NATS `Permissions` por identidad a nivel del
authenticator/cuenta, no solo lógica de membership en el control plane) → implementar las
Permissions por identidad derivadas de pertenencia, o documentar el límite y el plan.
**DoD:** `TestReaudit_H4_WildcardMetadataLeak` → el no-miembro ya NO capta los subjects de
rooms ajenas (o, si queda residual, está documentado con su límite exacto).
# Fuera de alcance (otros issues)
- **H9** (cuota/GC de blobs) → issue 0002; se solapa con 0005c (streaming a disco).
- **H10** (AEAD nonce) / **H11** (nonce/ts en firma de owner) → bajo, futuro.
- **H8** (custodia de la CA: generar en om) → operacional del deploy.
- **Auditoría de la superficie nueva de 0003** (cluster routes auth, jetstreamStore KV
fail-closed, nonce-cache replicado, failover) → el report 0006 NO la cubrió (auditó
pre-0003). Pendiente una re-auditoría dedicada de 0003 (prompt ya preparado).
# Definition of Done global
- `govulncheck ./...` → 0 alcanzables.
- Los tests adversariales de la re-auditoría (`TestReaudit_SigNilSpoof`,
`TestGap_PublicEnforceNoTLS`, `TestReaudit_H4_WildcardMetadataLeak`, DoS-concurrencia)
portados como regresión y en verde (o el residual documentado).
- `CGO_ENABLED=0 go build ./... && go vet ./... && go test ./...` verdes (incluido el e2e
multi-nodo de 0003, para confirmar que el bump de nats-server no lo rompió).
- Re-evaluación: el veredicto de exposición pública pasa de "NO-aún" a "sí-con-condiciones".
@@ -0,0 +1,160 @@
---
issue: 0006
title: Completar y endurecer el cluster — wiring del control plane KV + N1-N6 de la auditoría 0008
status: done
created: 2026-06-07
closed: 2026-06-07
closed_by: fases 0006a0006g (ver report 0009); unibus v0.8.0
domain: security
scope: unibus (cmd/membershipd, pkg/membership, pkg/embeddednats, pkg/busauth, pkg/client)
depends_on: 0003 (completa su wiring), 0005 (hereda el bus single-node ya seguro)
blocks: 0003f (deploy del cluster descentralizado)
source: projects/message_bus/reports/0008-2026-06-07-unibus-decentralization-audit.md
---
# Objetivo
La auditoría dedicada de la superficie de 0003 (report 0008) concluyó: **el bus en
cluster NO es seguro para público** por dos bloqueantes, y además **0003 dejó el
control plane descentralizado SIN cablear** (el binario sigue usando SQLite single-store;
el flag `decentralized` existe pero ningún código Go lo lee). Como nodo único standalone
unibus YA es seguro (report 0008 lo confirma); como cluster, no.
Este issue cierra los bloqueantes de seguridad del cluster Y completa el wiring que 0003
dejó a medias, de modo que el deploy descentralizado (0003f) sea seguro. Cada fase
reproduce el ataque del report 0008 (`TestAttack0008_*`) y verifica que ahora se rechaza.
# Fases (TBD, ramas `issue/0006x-*`)
## 0006a — N3 (BLOQUEANTE): cablear el nonce replicado en el binario
**Hallazgo (ALTA):** `membershipd` **nunca llama** `Server.UseReplicatedNonces`; cada nodo
usa `memNonceCache` por-proceso. Un request firmado aceptado en el nodo A se **replaya con
éxito en el nodo B** (200+200). La API (`kvNonceStore`) y el test
(`TestReplicatedNonceRejectsCrossNodeReplay`) existen, pero el binario no los invoca.
**Fix:** en `cmd/membershipd/main.go`, cuando se arranca con `--cluster-name` (o siempre que
haya JetStream disponible), llamar `srv.UseReplicatedNonces(js, replicas)` y **fail-fast** si
el bucket `KV_UNIBUS_nonces` no se crea. Regla dura: `--cluster-name != ""` ⇒ nonce replicado
**obligatorio** (no arrancar un nodo de cluster con nonce-cache local).
**DoD:** reproducir `TestAttack0008_N3` (2 nodos con el wiring exacto del binario) → el replay
del nonce al nodo B ahora da **401**. Golden (request normal OK en cualquier nodo) + edge
(single-node sin cluster sigue usando cache local OK) + error (replay cross-node → 401).
## 0006b — N2 (BLOQUEANTE): cerrar `$JS.API.>` / aislar el control plane KV
**Hallazgo (ALTA):** el grant ACL `clientInfraSubjects = {"_INBOX.>", "$JS.API.>"}`
(`acl.go:20`) deja a cualquier peer registrado leer los buckets KV del control plane
(`KV_UNIBUS_users/rooms/members/room_keys`) directo por NATS, saltándose `requireMember` y
los chequeos del HTTP. Fuga del allowlist (handles+roles+claves), grafo de rooms y metadata
de sealed-keys. (La ESCRITURA al KV ya está denegada — verificado; la fuga es de lectura.)
**Fix (elegir y documentar):**
- Sustituir el grant amplio `$JS.API.>` por permisos JetStream **mínimos por-room** (solo la
API del stream/consumer de las rooms del peer), y **denegar explícitamente** los streams
`KV_UNIBUS_*` y `OBJ_*`; o
- (Más robusto) aislar el control plane KV en una NATS **account separada**, inaccesible
desde la account de clientes.
**DoD:** reproducir `TestAttack0008_N2` → eve (registrada, no miembro) ya **NO** puede leer
los buckets KV (`Permissions Violation` o equivalente). La JetStream API legítima de las
rooms del peer sigue funcionando.
## 0006c — wiring del control plane KV (completar 0003)
**Hallazgo (MEDIA / raíz):** el binario no activa el store descentralizado. `membership.Open`
(SQLite) está hardcodeado en `main.go:90`; `OpenJetStream` solo lo usa `migrate-to-kv`.
**Fix:** leer el flag `decentralized` (o un `--store kv|sqlite`) y **seleccionar el store** en
el arranque: SQLite (default, single-node/dev) o `jetstreamStore` (cluster). Resolver el
"ciclo bootstrap" del authenticator interno (el authenticator necesita el store para
`IsAuthorized`, y el store KV necesita el NATS arrancado). Mantener branch-by-abstraction:
con el flag off, comportamiento idéntico al actual. `IsAuthorized`/lecturas sobre KV
**fail-closed** ante pérdida de quorum/timeout (ya implementado en `jetstreamStore`
verificar que el wiring lo preserva).
**DoD:** con `decentralized: on` + cluster, el control plane sirve desde el KV replicado y un
nodo nuevo ve las rooms creadas en otro (cierra la divergencia de estado que nota N5).
Fail-closed: simular KV no disponible → deniega. Con flag off, suite idéntica al baseline.
## 0006d — N1 (ALTA): posture homogénea del cluster
**Hallazgo:** el cluster es tan seguro como su nodo más débil; un nodo sin authenticator o
`--bus-auth off` deja a un peer no autenticado `Subscribe(">")` y cosechar el tráfico
reenviado de los nodos con ACL.
**Fix:** garantizar (en arranque/health) que todos los nodos corren `enforce`+ACL+TLS;
rechazar formar cluster con un peer en posture inferior, o como mínimo documentar y exponer
un health que lo detecte. Nunca exponer el puerto de cliente de un nodo sin enforce.
**DoD:** reproducir `TestAttack0008_N1` escenario 2 (cluster con un nodo `withACL=false`) →
el arranque/health lo rechaza o lo señala; documentar la garantía.
## 0006e — N4 (MEDIA): RefreshSession en los clientes
**Hallazgo:** la ACL congela permisos al conectar; un peer que crea/se une a una room debe
llamar `client.RefreshSession()` para pub/sub en su subject. **Ningún cliente lo llama**
(`cmd/chat`, `cmd/worker`, `mobile`, `gateway`). Es fail-closed (deniega), pero rompe la
usabilidad bajo `enforce`+ACL → empuja al operador a desactivar la ACL (regresión de
seguridad a discreción del operador).
**Fix:** llamar `RefreshSession` tras cambios de membresía en `cmd/chat`/`cmd/worker` (y
documentar el contrato para `mobile`/`gateway`), o implementar refresh transparente (rehacer
suscripciones automáticamente al unirse a una room).
**DoD:** test que crea/une room bajo enforce+ACL y publica/recibe SIN intervención manual
(el cliente refresca solo o el demo llama RefreshSession). Documentar el requisito.
## 0006f — bajos: CA de routes, secreto de cluster, migrate-to-kv, R1≠HA
- **N1 (BAJA):** CA **separada** para las routes del cluster (no reusar la CA del data plane
de clientes); pasar el secreto de cluster por **archivo/env**, no por `--routes
nats://user:pass@host` en argv (hoy visible en `ps`/`journald`).
- **N6 (BAJA):** `migrate-to-kv` solo en loopback o con TLS (hoy el allowlist viaja plaintext
si `--nats-url` remoto sin `--ca`).
- **N3-DoS (MEDIA, doc):** documentar que el nonce/control plane en **R1 es SPOF de auth** (su
caída rechaza todos los requests autenticados); R3 (quorum 2/3) es la condición de HA real.
No vender R1 como HA.
## 0006g — preparar el material de deploy del cluster (3 nodos)
Los tres nodos del cluster están decididos: **magnus + homer + datardos** (3 VPS OVH →
quorum R3 real, tolera la caída de uno). Datos: homer `141.94.69.66`; datardos `ssh dd`
`51.91.100.142` (WG `datardos-wg` 10.21.0.x); magnus en `pass` (`MAGNUS_ovh_ssh_ROOT`).
**Preparar (NO ejecutar en los VPS — eso es 0003f, lo hace el humano):** dejar en
`deploy/cluster/` el material parametrizado por nodo:
- `generate-cluster-certs.sh` — CA propia del cluster (separada de la de clientes, ver 0006f)
+ un server cert por nodo con SAN = su IP pública + su IP WG + hostname.
- una plantilla de systemd unit por nodo (`membershipd@.service` o tres units) con
`--bind 0.0.0.0 --bus-auth enforce --tls-cert … --cluster-name unibus --routes
nats://…@<otros-2-nodos> --store kv` y `Restart=always`, secreto de cluster por archivo/env.
- `deploy-cluster.sh` (cross-build linux + rsync por nodo + plan de arranque escalonado).
- un `README.md` con el runbook: orden de arranque, seed del admin, `migrate-to-kv` (loopback/TLS),
escalado de réplicas a R3 (`nats stream update --replicas 3`), verificación de quorum y chaos
test (matar un nodo). Marcar claramente qué pasos toca el humano.
**DoD:** el material existe y es coherente (los certs cubren los 3 nodos; las units referencian
los routes correctos); un `bash -n` de los scripts pasa; el README describe el deploy end-to-end.
NO se toca ningún VPS desde el agente.
# Fuera de alcance (otros issues / operacional)
- **H8** (CA generada/custodiada en om) → operacional del deploy 0003f.
- **H9/H10/H11** → issue 0002 / futuro.
- **Object Store (blobs) vía `$JS.API.>`**: el report 0008 lo marca como "probable misma
clase que N2, no verificado" (impacto menor: blobs son ciphertext E2E). El fix de 0006b
(denegar `OBJ_*`) lo cubre; verificar.
- **Chaos test de red real** (matar 1/3, 2/3, split-brain) → 0003f (requiere 3 VPS).
# Definition of Done global
- `TestAttack0008_N3` → replay cross-node **401**; `TestAttack0008_N2` → eve no lee buckets KV;
`TestAttack0008_N1` → nodo débil rechazado/señalado. Portados como regresión.
- Con `decentralized: on`: control plane sobre KV replicado, fail-closed verificado, estado
consistente entre nodos. Con flag off: baseline idéntico.
- Clientes operan bajo `enforce`+ACL sin intervención manual (RefreshSession resuelto).
- `CGO_ENABLED=0 go build ./... && go vet ./... && go test ./...` verdes + `govulncheck` 0.
- Veredicto re-evaluado: el bus DESCENTRALIZADO pasa de "NO" a "sí-con-condiciones" (3 nodos
R3 para HA real, posture homogénea, CA en om).
+10 -9
View File
@@ -1,26 +1,28 @@
module github.com/enmanuel/unibus module github.com/enmanuel/unibus
go 1.25.0 go 1.26.4
replace fn-registry => ../../../../ replace fn-registry => ../../../../
require ( require (
fn-registry v0.0.0-00010101000000-000000000000 fn-registry v0.0.0-00010101000000-000000000000
github.com/nats-io/nats-server/v2 v2.10.22 github.com/nats-io/nats-server/v2 v2.11.15
github.com/nats-io/nats.go v1.37.0 github.com/nats-io/nats.go v1.49.0
github.com/nats-io/nkeys v0.4.7 github.com/nats-io/nkeys v0.4.15
github.com/oklog/ulid/v2 v2.1.0 github.com/oklog/ulid/v2 v2.1.0
golang.org/x/time v0.7.0 golang.org/x/time v0.15.0
modernc.org/sqlite v1.47.0 modernc.org/sqlite v1.47.0
) )
require ( require (
github.com/antithesishq/antithesis-sdk-go v0.6.0-default-no-op // indirect
github.com/dustin/go-humanize v1.0.1 // indirect github.com/dustin/go-humanize v1.0.1 // indirect
github.com/google/go-tpm v0.9.8 // indirect
github.com/google/uuid v1.6.0 // indirect github.com/google/uuid v1.6.0 // indirect
github.com/klauspost/compress v1.18.3 // indirect github.com/klauspost/compress v1.18.4 // indirect
github.com/mattn/go-isatty v0.0.20 // indirect github.com/mattn/go-isatty v0.0.20 // indirect
github.com/minio/highwayhash v1.0.3 // indirect github.com/minio/highwayhash v1.0.4-0.20251030100505-070ab1a87a76 // indirect
github.com/nats-io/jwt/v2 v2.5.8 // indirect github.com/nats-io/jwt/v2 v2.8.1 // indirect
github.com/nats-io/nuid v1.0.1 // indirect github.com/nats-io/nuid v1.0.1 // indirect
github.com/ncruces/go-strftime v1.0.0 // indirect github.com/ncruces/go-strftime v1.0.0 // indirect
github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec // indirect github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec // indirect
@@ -29,7 +31,6 @@ require (
golang.org/x/mod v0.36.0 // indirect golang.org/x/mod v0.36.0 // indirect
golang.org/x/sync v0.20.0 // indirect golang.org/x/sync v0.20.0 // indirect
golang.org/x/sys v0.44.0 // indirect golang.org/x/sys v0.44.0 // indirect
golang.org/x/text v0.37.0 // indirect
golang.org/x/tools v0.45.0 // indirect golang.org/x/tools v0.45.0 // indirect
modernc.org/libc v1.70.0 // indirect modernc.org/libc v1.70.0 // indirect
modernc.org/mathutil v1.7.1 // indirect modernc.org/mathutil v1.7.1 // indirect
+18 -16
View File
@@ -1,27 +1,31 @@
github.com/antithesishq/antithesis-sdk-go v0.6.0-default-no-op h1:kpBdlEPbRvff0mDD1gk7o9BhI16b9p5yYAXRlidpqJE=
github.com/antithesishq/antithesis-sdk-go v0.6.0-default-no-op/go.mod h1:IUpT2DPAKh6i/YhSbt6Gl3v2yvUZjmKncl7U91fup7E=
github.com/dustin/go-humanize v1.0.1 h1:GzkhY7T5VNhEkwH0PVJgjz+fX1rhBrR7pRT3mDkpeCY= github.com/dustin/go-humanize v1.0.1 h1:GzkhY7T5VNhEkwH0PVJgjz+fX1rhBrR7pRT3mDkpeCY=
github.com/dustin/go-humanize v1.0.1/go.mod h1:Mu1zIs6XwVuF/gI1OepvI0qD18qycQx+mFykh5fBlto= github.com/dustin/go-humanize v1.0.1/go.mod h1:Mu1zIs6XwVuF/gI1OepvI0qD18qycQx+mFykh5fBlto=
github.com/google/go-cmp v0.6.0 h1:ofyhxvXcZhMsU5ulbFiLKl/XBFqE1GSq7atu8tAmTRI= github.com/google/go-cmp v0.6.0 h1:ofyhxvXcZhMsU5ulbFiLKl/XBFqE1GSq7atu8tAmTRI=
github.com/google/go-cmp v0.6.0/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY= github.com/google/go-cmp v0.6.0/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
github.com/google/go-tpm v0.9.8 h1:slArAR9Ft+1ybZu0lBwpSmpwhRXaa85hWtMinMyRAWo=
github.com/google/go-tpm v0.9.8/go.mod h1:h9jEsEECg7gtLis0upRBQU+GhYVH6jMjrFxI8u6bVUY=
github.com/google/pprof v0.0.0-20250317173921-a4b03ec1a45e h1:ijClszYn+mADRFY17kjQEVQ1XRhq2/JR1M3sGqeJoxs= github.com/google/pprof v0.0.0-20250317173921-a4b03ec1a45e h1:ijClszYn+mADRFY17kjQEVQ1XRhq2/JR1M3sGqeJoxs=
github.com/google/pprof v0.0.0-20250317173921-a4b03ec1a45e/go.mod h1:boTsfXsheKC2y+lKOCMpSfarhxDeIzfZG1jqGcPl3cA= github.com/google/pprof v0.0.0-20250317173921-a4b03ec1a45e/go.mod h1:boTsfXsheKC2y+lKOCMpSfarhxDeIzfZG1jqGcPl3cA=
github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0= github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo= github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
github.com/hashicorp/golang-lru/v2 v2.0.7 h1:a+bsQ5rvGLjzHuww6tVxozPZFVghXaHOwFs4luLUK2k= github.com/hashicorp/golang-lru/v2 v2.0.7 h1:a+bsQ5rvGLjzHuww6tVxozPZFVghXaHOwFs4luLUK2k=
github.com/hashicorp/golang-lru/v2 v2.0.7/go.mod h1:QeFd9opnmA6QUJc5vARoKUSoFhyfM2/ZepoAG6RGpeM= github.com/hashicorp/golang-lru/v2 v2.0.7/go.mod h1:QeFd9opnmA6QUJc5vARoKUSoFhyfM2/ZepoAG6RGpeM=
github.com/klauspost/compress v1.18.3 h1:9PJRvfbmTabkOX8moIpXPbMMbYN60bWImDDU7L+/6zw= github.com/klauspost/compress v1.18.4 h1:RPhnKRAQ4Fh8zU2FY/6ZFDwTVTxgJ/EMydqSTzE9a2c=
github.com/klauspost/compress v1.18.3/go.mod h1:R0h/fSBs8DE4ENlcrlib3PsXS61voFxhIs2DeRhCvJ4= github.com/klauspost/compress v1.18.4/go.mod h1:R0h/fSBs8DE4ENlcrlib3PsXS61voFxhIs2DeRhCvJ4=
github.com/mattn/go-isatty v0.0.20 h1:xfD0iDuEKnDkl03q4limB+vH+GxLEtL/jb4xVJSWWEY= github.com/mattn/go-isatty v0.0.20 h1:xfD0iDuEKnDkl03q4limB+vH+GxLEtL/jb4xVJSWWEY=
github.com/mattn/go-isatty v0.0.20/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y= github.com/mattn/go-isatty v0.0.20/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y=
github.com/minio/highwayhash v1.0.3 h1:kbnuUMoHYyVl7szWjSxJnxw11k2U709jqFPPmIUyD6Q= github.com/minio/highwayhash v1.0.4-0.20251030100505-070ab1a87a76 h1:KGuD/pM2JpL9FAYvBrnBBeENKZNh6eNtjqytV6TYjnk=
github.com/minio/highwayhash v1.0.3/go.mod h1:GGYsuwP/fPD6Y9hMiXuapVvlIUEhFhMTh0rxU3ik1LQ= github.com/minio/highwayhash v1.0.4-0.20251030100505-070ab1a87a76/go.mod h1:GGYsuwP/fPD6Y9hMiXuapVvlIUEhFhMTh0rxU3ik1LQ=
github.com/nats-io/jwt/v2 v2.5.8 h1:uvdSzwWiEGWGXf+0Q+70qv6AQdvcvxrv9hPM0RiPamE= github.com/nats-io/jwt/v2 v2.8.1 h1:V0xpGuD/N8Mi+fQNDynXohVvp7ZztevW5io8CUWlPmU=
github.com/nats-io/jwt/v2 v2.5.8/go.mod h1:ZdWS1nZa6WMZfFwwgpEaqBV8EPGVgOTDHN/wTbz0Y5A= github.com/nats-io/jwt/v2 v2.8.1/go.mod h1:nWnOEEiVMiKHQpnAy4eXlizVEtSfzacZ1Q43LIRavZg=
github.com/nats-io/nats-server/v2 v2.10.22 h1:Yt63BGu2c3DdMoBZNcR6pjGQwk/asrKU7VX846ibxDA= github.com/nats-io/nats-server/v2 v2.11.15 h1:StSf9TINInaZtr4oww2+kXmfwa9SkN//g/LwS19/UJ0=
github.com/nats-io/nats-server/v2 v2.10.22/go.mod h1:X/m1ye9NYansUXYFrbcDwUi/blHkrgHh2rgCJaakonk= github.com/nats-io/nats-server/v2 v2.11.15/go.mod h1:zwhv8Y0PE3KHyKgznJc/9Xoai638SaJd83zzJ5GJn74=
github.com/nats-io/nats.go v1.37.0 h1:07rauXbVnnJvv1gfIyghFEo6lUcYRY0WXc3x7x0vUxE= github.com/nats-io/nats.go v1.49.0 h1:yh/WvY59gXqYpgl33ZI+XoVPKyut/IcEaqtsiuTJpoE=
github.com/nats-io/nats.go v1.37.0/go.mod h1:Ubdu4Nh9exXdSz0RVWRFBbRfrbSxOYd26oF0wkWclB8= github.com/nats-io/nats.go v1.49.0/go.mod h1:fDCn3mN5cY8HooHwE2ukiLb4p4G4ImmzvXyJt+tGwdw=
github.com/nats-io/nkeys v0.4.7 h1:RwNJbbIdYCoClSDNY7QVKZlyb/wfT6ugvFCiKy6vDvI= github.com/nats-io/nkeys v0.4.15 h1:JACV5jRVO9V856KOapQ7x+EY8Jo3qw1vJt/9Jpwzkk4=
github.com/nats-io/nkeys v0.4.7/go.mod h1:kqXRgRDPlGy7nGaEDMuYzmiJCIAAWDK0IMBtDmGD0nc= github.com/nats-io/nkeys v0.4.15/go.mod h1:CpMchTXC9fxA5zrMo4KpySxNjiDVvr8ANOSZdiNfUrs=
github.com/nats-io/nuid v1.0.1 h1:5iA8DT8V7q8WK2EScv2padNa/rTESc1KdnPw4TC2paw= github.com/nats-io/nuid v1.0.1 h1:5iA8DT8V7q8WK2EScv2padNa/rTESc1KdnPw4TC2paw=
github.com/nats-io/nuid v1.0.1/go.mod h1:19wcPz3Ph3q0Jbyiqsd0kePYG7A95tJPxeL+1OSON2c= github.com/nats-io/nuid v1.0.1/go.mod h1:19wcPz3Ph3q0Jbyiqsd0kePYG7A95tJPxeL+1OSON2c=
github.com/ncruces/go-strftime v1.0.0 h1:HMFp8mLCTPp341M/ZnA4qaf7ZlsbTc+miZjCLOFAw7w= github.com/ncruces/go-strftime v1.0.0 h1:HMFp8mLCTPp341M/ZnA4qaf7ZlsbTc+miZjCLOFAw7w=
@@ -43,10 +47,8 @@ golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.21.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA= golang.org/x/sys v0.21.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/sys v0.44.0 h1:ildZl3J4uzeKP07r2F++Op7E9B29JRUy+a27EibtBTQ= golang.org/x/sys v0.44.0 h1:ildZl3J4uzeKP07r2F++Op7E9B29JRUy+a27EibtBTQ=
golang.org/x/sys v0.44.0/go.mod h1:4GL1E5IUh+htKOUEOaiffhrAeqysfVGipDYzABqnCmw= golang.org/x/sys v0.44.0/go.mod h1:4GL1E5IUh+htKOUEOaiffhrAeqysfVGipDYzABqnCmw=
golang.org/x/text v0.37.0 h1:Cqjiwd9eSg8e0QAkyCaQTNHFIIzWtidPahFWR83rTrc= golang.org/x/time v0.15.0 h1:bbrp8t3bGUeFOx08pvsMYRTCVSMk89u4tKbNOZbp88U=
golang.org/x/text v0.37.0/go.mod h1:a5sjxXGs9hsn/AJVwuElvCAo9v8QYLzvavO5z2PiM38= golang.org/x/time v0.15.0/go.mod h1:Y4YMaQmXwGQZoFaVFk4YpCt4FLQMYKZe9oeV/f4MSno=
golang.org/x/time v0.7.0 h1:ntUhktv3OPE6TgYxXWv9vKvUSJyIFJlyohwbkEwPrKQ=
golang.org/x/time v0.7.0/go.mod h1:3BpzKBy/shNhVucY/MWOyx10tF3SFh9QdLuxbVysPQM=
golang.org/x/tools v0.45.0 h1:18qN3FAooORvApf5XjCXgsuayZOEtXf6JK18I3+ONa8= golang.org/x/tools v0.45.0 h1:18qN3FAooORvApf5XjCXgsuayZOEtXf6JK18I3+ONa8=
golang.org/x/tools v0.45.0/go.mod h1:LuUGqqaXcXMEFEruIVJVm5mgDD8vww/z/SR1gQ4uE/0= golang.org/x/tools v0.45.0/go.mod h1:LuUGqqaXcXMEFEruIVJVm5mgDD8vww/z/SR1gQ4uE/0=
golang.org/x/tools/go/expect v0.1.1-deprecated h1:jpBZDwmgPhXsKZC6WhL20P4b/wmnpsEAGHaNy0n/rJM= golang.org/x/tools/go/expect v0.1.1-deprecated h1:jpBZDwmgPhXsKZC6WhL20P4b/wmnpsEAGHaNy0n/rJM=
-165
View File
@@ -1,165 +0,0 @@
// Package mobile exposes a flat, gomobile-friendly API over the unibus client
// so an Android app can join rooms, publish, and receive messages with the same
// end-to-end encryption as any native Go peer.
//
// gomobile only supports a limited set of types across the binding boundary
// (string, []byte, int, bool, error, named structs, and interfaces). This layer
// translates the richer client API into those primitives and delivers incoming
// frames through a Java/Kotlin-implemented FrameListener callback. No protocol
// or cryptography is reimplemented here: every call delegates to pkg/client,
// which is the single source of truth shared with every other peer on the bus.
package mobile
import (
"encoding/base64"
"encoding/json"
"fmt"
"time"
"github.com/enmanuel/unibus/pkg/client"
"github.com/enmanuel/unibus/pkg/frame"
"github.com/enmanuel/unibus/pkg/room"
)
// FrameListener receives decrypted messages for a subscribed room. The Android
// side implements this interface. Its methods are invoked from a NATS delivery
// goroutine, so implementations must hop back to the UI thread (for example via
// a coroutine on the main dispatcher) before touching Android views.
type FrameListener interface {
OnFrame(roomID string, sender string, msgID string, text string)
}
// Session is a connected unibus peer. Create it with NewSession and close it
// with Close when the app stops.
type Session struct {
c *client.Client
}
// GenerateIdentity creates (or loads) the long-term keypair stored at path.
// Call it once on first launch. The resulting file holds the peer's private
// Ed25519 and X25519 keys and must be kept private to the app sandbox.
func GenerateIdentity(path string) error {
_, err := client.LoadOrCreateIdentity(path)
return err
}
// NewSession loads the identity at idPath and connects to the bus. natsURL is
// the data plane (for example tls://host:4250) and ctrlURL is the control plane
// HTTP endpoint (for example http://host:8470). caPath is the path to the bus
// CA certificate (ca.crt) bundled with the app: when set, the session connects
// securely (TLS pinned to that CA + nkey authentication on the data plane),
// matching a bus running with auth + TLS. Pass an empty caPath to connect in
// plaintext to an unsecured (dev) bus.
func NewSession(idPath, natsURL, ctrlURL, caPath string) (*Session, error) {
id, err := client.LoadOrCreateIdentity(idPath)
if err != nil {
return nil, err
}
c, err := client.Connect(natsURL, ctrlURL, id, caPath)
if err != nil {
return nil, err
}
return &Session{c: c}, nil
}
// EndpointID returns this peer's stable endpoint identifier, derived from its
// signing public key. It is the value that appears as the sender of frames.
func (s *Session) EndpointID() string {
return s.c.Endpoint().ID
}
// CreateRoom opens a room on the given subject. mode is "matrix" for the
// encrypted, persisted and signed policy, or "nats" for plain cleartext. It
// returns the room id used by Join, Publish and Subscribe.
func (s *Session) CreateRoom(subject, mode string) (string, error) {
p := room.ModeNATS
if mode == "matrix" {
p = room.ModeMatrix
}
return s.c.CreateRoom(subject, p)
}
// Join fetches the room key when the room is encrypted and prepares the session
// to publish to and receive from the room.
func (s *Session) Join(roomID string) error {
return s.c.Join(roomID)
}
// Publish sends a UTF-8 text message to the room.
func (s *Session) Publish(roomID, text string) error {
return s.c.Publish(roomID, []byte(text))
}
// Subscribe streams decrypted messages of the room to the listener until the
// session is closed.
func (s *Session) Subscribe(roomID string, l FrameListener) error {
_, err := s.c.Subscribe(roomID, func(f frame.Frame, plaintext []byte) {
l.OnFrame(roomID, f.Sender, f.MsgID, string(plaintext))
})
return err
}
// cardJSON is the portable, copy-pasteable public identity a peer shares so a
// room owner can invite it to an encrypted room. It carries no secret: only the
// endpoint id and the two public keys (signing + key-exchange), base64-encoded
// for transport over text or a QR code.
type cardJSON struct {
ID string `json:"id"`
SignPub string `json:"sign_pub"` // base64 std of the Ed25519 public key
KexPub string `json:"kex_pub"` // base64 std of the X25519 public key
}
// Card returns this peer's public identity as a portable JSON string. Share it
// (paste, QR) with a room owner so they can Invite you to an encrypted room. It
// contains no private key and is safe to transmit in the clear.
func (s *Session) Card() string {
ep := s.c.Endpoint()
b, _ := json.Marshal(cardJSON{
ID: ep.ID,
SignPub: base64.StdEncoding.EncodeToString(ep.SignPub),
KexPub: base64.StdEncoding.EncodeToString(ep.KexPub),
})
return string(b)
}
// Invite adds the holder of peerCard to roomID. peerCard is the JSON string the
// invitee produced with Card(). For encrypted rooms this seals the current room
// key to the invitee's X25519 public key and signs the request; the caller must
// be the room owner.
func (s *Session) Invite(roomID, peerCard string) error {
var card cardJSON
if err := json.Unmarshal([]byte(peerCard), &card); err != nil {
return fmt.Errorf("mobile: bad peer card: %w", err)
}
signPub, err := base64.StdEncoding.DecodeString(card.SignPub)
if err != nil {
return fmt.Errorf("mobile: bad sign_pub in card: %w", err)
}
kexPub, err := base64.StdEncoding.DecodeString(card.KexPub)
if err != nil {
return fmt.Errorf("mobile: bad kex_pub in card: %w", err)
}
return s.c.Invite(roomID, client.Endpoint{ID: card.ID, SignPub: signPub, KexPub: kexPub})
}
// Kick removes endpointID from roomID and, for encrypted rooms, rotates the room
// key to a new epoch so the removed peer cannot decrypt messages published after
// the kick (forward secrecy). The caller must be the room owner.
func (s *Session) Kick(roomID, endpointID string) error {
return s.c.Kick(roomID, endpointID)
}
// Request performs an RPC request/reply against subject and returns the reply
// payload as text. timeoutMs bounds the wait in milliseconds.
func (s *Session) Request(subject, text string, timeoutMs int) (string, error) {
out, err := s.c.Request(subject, []byte(text), time.Duration(timeoutMs)*time.Millisecond)
if err != nil {
return "", err
}
return string(out), nil
}
// Close disconnects the peer from the bus.
func (s *Session) Close() error {
return s.c.Close()
}
+27 -10
View File
@@ -1,9 +1,15 @@
// Package blobstore is a content-addressed object store on local disk. // Package blobstore is a content-addressed object store for media ciphertext.
// //
// The bus transports messages, not blobs. Media (images, files, large payloads) // The bus transports messages, not blobs. Media (images, files, large payloads)
// is encrypted by the client BEFORE being stored here, so the store only ever // is encrypted by the client BEFORE being stored here, so the store only ever
// sees ciphertext. Objects are addressed by the sha256 hex of their (encrypted) // sees ciphertext. Objects are addressed by the sha256 hex of their (encrypted)
// bytes, which makes Put idempotent and deduplicating. // bytes, which makes Put idempotent and deduplicating.
//
// Store is an interface (branch-by-abstraction, issue 0003d) with two backends:
// diskStore (the default, local filesystem) and objectStore (NATS Object Store
// on JetStream, replicated across the cluster so blobs survive a node loss and
// are reachable from any node). The wire contract (sha256-hex addressing) is
// identical, so a client cannot tell which backend a membershipd uses.
package blobstore package blobstore
import ( import (
@@ -14,27 +20,38 @@ import (
"path/filepath" "path/filepath"
) )
// Store is a directory-backed content-addressed blob store. // Store is a content-addressed blob store: Put returns the sha256-hex address of
type Store struct { // the stored bytes, Get fetches by that address, Has reports presence.
type Store interface {
Put(data []byte) (string, error)
Get(hash string) ([]byte, error)
Has(hash string) bool
}
// diskStore is a directory-backed content-addressed blob store (the default,
// single-node backend).
type diskStore struct {
dir string dir string
} }
// New creates a Store rooted at dir, creating the directory if needed. // New creates a disk-backed Store rooted at dir, creating the directory if
func New(dir string) (*Store, error) { // needed. It remains the default backend; the replicated NATS Object Store is
// constructed separately (NewObjectStore) when decentralization is enabled.
func New(dir string) (Store, error) {
if err := os.MkdirAll(dir, 0o755); err != nil { if err := os.MkdirAll(dir, 0o755); err != nil {
return nil, fmt.Errorf("blobstore: mkdir %q: %w", dir, err) return nil, fmt.Errorf("blobstore: mkdir %q: %w", dir, err)
} }
return &Store{dir: dir}, nil return &diskStore{dir: dir}, nil
} }
// path returns the on-disk path for a given content hash. // path returns the on-disk path for a given content hash.
func (s *Store) path(hash string) string { func (s *diskStore) path(hash string) string {
return filepath.Join(s.dir, hash) return filepath.Join(s.dir, hash)
} }
// Put writes data to the store and returns its sha256 hex hash. If an object // Put writes data to the store and returns its sha256 hex hash. If an object
// with the same content already exists, Put is a no-op and returns the hash. // with the same content already exists, Put is a no-op and returns the hash.
func (s *Store) Put(data []byte) (string, error) { func (s *diskStore) Put(data []byte) (string, error) {
sum := sha256.Sum256(data) sum := sha256.Sum256(data)
hash := hex.EncodeToString(sum[:]) hash := hex.EncodeToString(sum[:])
p := s.path(hash) p := s.path(hash)
@@ -66,7 +83,7 @@ func (s *Store) Put(data []byte) (string, error) {
} }
// Get reads the object with the given hash. // Get reads the object with the given hash.
func (s *Store) Get(hash string) ([]byte, error) { func (s *diskStore) Get(hash string) ([]byte, error) {
data, err := os.ReadFile(s.path(hash)) data, err := os.ReadFile(s.path(hash))
if err != nil { if err != nil {
return nil, fmt.Errorf("blobstore: get %q: %w", hash, err) return nil, fmt.Errorf("blobstore: get %q: %w", hash, err)
@@ -75,7 +92,7 @@ func (s *Store) Get(hash string) ([]byte, error) {
} }
// Has reports whether an object with the given hash exists. // Has reports whether an object with the given hash exists.
func (s *Store) Has(hash string) bool { func (s *diskStore) Has(hash string) bool {
_, err := os.Stat(s.path(hash)) _, err := os.Stat(s.path(hash))
return err == nil return err == nil
} }
+102
View File
@@ -0,0 +1,102 @@
package blobstore
// objectStore is the NATS Object Store implementation of Store (issue 0003d):
// media ciphertext lives in a JetStream Object Store bucket replicated across
// the cluster, so a blob uploaded to one node is durable against the loss of a
// node and readable from any node. It is selected when decentralization is on;
// diskStore stays the single-node default. The content-addressing (sha256-hex)
// is identical to the disk backend, so the wire contract does not change.
import (
"context"
"crypto/sha256"
"encoding/hex"
"fmt"
"time"
"github.com/nats-io/nats.go/jetstream"
)
const (
defaultObjectBucket = "UNIBUS_blobs"
defaultObjOpTime = 10 * time.Second
)
// ObjectStoreConfig configures the replicated Object Store backend.
type ObjectStoreConfig struct {
// Bucket is the object store bucket name; empty uses UNIBUS_blobs.
Bucket string
// Replicas is the replication factor (R1..R5), matching the KV store's
// R1->R3 rollout.
Replicas int
// OpTimeout bounds each object operation; zero uses defaultObjOpTime.
OpTimeout time.Duration
}
type objectStore struct {
os jetstream.ObjectStore
opTimeout time.Duration
}
// NewObjectStore creates (or opens) the replicated Object Store bucket on js and
// returns it as a Store. The JetStream context belongs to the caller.
func NewObjectStore(js jetstream.JetStream, cfg ObjectStoreConfig) (Store, error) {
if cfg.Bucket == "" {
cfg.Bucket = defaultObjectBucket
}
if cfg.Replicas <= 0 {
cfg.Replicas = 1
}
opTimeout := cfg.OpTimeout
if opTimeout <= 0 {
opTimeout = defaultObjOpTime
}
ctx, cancel := context.WithTimeout(context.Background(), 15*time.Second)
defer cancel()
obj, err := js.CreateOrUpdateObjectStore(ctx, jetstream.ObjectStoreConfig{
Bucket: cfg.Bucket,
Replicas: cfg.Replicas,
Storage: jetstream.FileStorage,
})
if err != nil {
return nil, fmt.Errorf("blobstore: open object store %q (replicas=%d): %w", cfg.Bucket, cfg.Replicas, err)
}
return &objectStore{os: obj, opTimeout: opTimeout}, nil
}
func (s *objectStore) ctx() (context.Context, context.CancelFunc) {
return context.WithTimeout(context.Background(), s.opTimeout)
}
// Put stores data under its sha256-hex address. Re-putting identical bytes is a
// harmless overwrite (same address, same content), preserving the idempotent,
// deduplicating semantics of the disk backend.
func (s *objectStore) Put(data []byte) (string, error) {
sum := sha256.Sum256(data)
hash := hex.EncodeToString(sum[:])
ctx, cancel := s.ctx()
defer cancel()
if _, err := s.os.PutBytes(ctx, hash, data); err != nil {
return "", fmt.Errorf("blobstore: put object %q: %w", hash, err)
}
return hash, nil
}
// Get fetches the object by its hash address.
func (s *objectStore) Get(hash string) ([]byte, error) {
ctx, cancel := s.ctx()
defer cancel()
data, err := s.os.GetBytes(ctx, hash)
if err != nil {
return nil, fmt.Errorf("blobstore: get object %q: %w", hash, err)
}
return data, nil
}
// Has reports whether an object with the given hash exists.
func (s *objectStore) Has(hash string) bool {
ctx, cancel := s.ctx()
defer cancel()
_, err := s.os.GetInfo(ctx, hash)
return err == nil
}
+132
View File
@@ -0,0 +1,132 @@
package blobstore_test
import (
"bytes"
"crypto/sha256"
"encoding/hex"
"net"
"testing"
"time"
"github.com/enmanuel/unibus/pkg/blobstore"
"github.com/enmanuel/unibus/pkg/embeddednats"
"github.com/nats-io/nats.go"
"github.com/nats-io/nats.go/jetstream"
)
func objFreePort(t *testing.T) int {
t.Helper()
l, err := net.Listen("tcp", "127.0.0.1:0")
if err != nil {
t.Fatalf("free port: %v", err)
}
defer l.Close()
return l.Addr().(*net.TCPAddr).Port
}
// newObjectStore boots a single-node embedded NATS with JetStream and returns a
// replicated (R1) Object Store backend over it.
func newObjectStore(t *testing.T) blobstore.Store {
t.Helper()
ns, err := embeddednats.StartServer(embeddednats.ServerConfig{
StoreDir: t.TempDir(),
Host: "127.0.0.1",
Port: objFreePort(t),
})
if err != nil {
t.Fatalf("embedded nats: %v", err)
}
nc, err := nats.Connect(ns.ClientURL())
if err != nil {
ns.Shutdown()
t.Fatalf("nats connect: %v", err)
}
js, err := jetstream.New(nc)
if err != nil {
nc.Close()
ns.Shutdown()
t.Fatalf("jetstream: %v", err)
}
st, err := blobstore.NewObjectStore(js, blobstore.ObjectStoreConfig{Replicas: 1, OpTimeout: 5 * time.Second})
if err != nil {
nc.Close()
ns.Shutdown()
t.Fatalf("new object store: %v", err)
}
t.Cleanup(func() { nc.Close(); ns.Shutdown(); ns.WaitForShutdown() })
return st
}
// TestObjectStoreRoundTrip is the golden path: put ciphertext, get it back by
// its hash, Has reports presence, and re-putting identical bytes returns the
// same address (content-addressed dedup).
func TestObjectStoreRoundTrip(t *testing.T) {
s := newObjectStore(t)
data := []byte("encrypted-media-ciphertext-payload")
hash, err := s.Put(data)
if err != nil {
t.Fatalf("Put: %v", err)
}
want := hex.EncodeToString(sha256Sum(data))
if hash != want {
t.Fatalf("hash = %q, want sha256 hex %q", hash, want)
}
got, err := s.Get(hash)
if err != nil {
t.Fatalf("Get: %v", err)
}
if !bytes.Equal(got, data) {
t.Fatalf("Get returned %q, want %q", got, data)
}
if !s.Has(hash) {
t.Fatalf("Has should be true for a stored blob")
}
// Re-put identical bytes: same address, no error.
hash2, err := s.Put(data)
if err != nil || hash2 != hash {
t.Fatalf("re-Put: hash2=%q err=%v (want %q)", hash2, err, hash)
}
}
// TestObjectStoreMissing is the edge/error path: a hash that was never stored
// is absent and unreadable.
func TestObjectStoreMissing(t *testing.T) {
s := newObjectStore(t)
missing := hex.EncodeToString(sha256Sum([]byte("never stored")))
if s.Has(missing) {
t.Fatalf("Has should be false for an unknown hash")
}
if _, err := s.Get(missing); err == nil {
t.Fatalf("Get of an unknown hash should error")
}
}
// TestObjectStoreAddressMatchesDisk is the contract test: the Object Store and
// the disk backend address identical bytes to the IDENTICAL hash, so a client
// cannot tell which backend a node uses and a blob ref is portable across them.
func TestObjectStoreAddressMatchesDisk(t *testing.T) {
obj := newObjectStore(t)
disk, err := blobstore.New(t.TempDir())
if err != nil {
t.Fatalf("disk store: %v", err)
}
for _, payload := range [][]byte{[]byte("a"), []byte("longer ciphertext blob \x00\x01\x02"), {}} {
oh, err := obj.Put(payload)
if err != nil {
t.Fatalf("object Put: %v", err)
}
dh, err := disk.Put(payload)
if err != nil {
t.Fatalf("disk Put: %v", err)
}
if oh != dh {
t.Fatalf("address mismatch for %q: object=%q disk=%q", payload, oh, dh)
}
}
}
func sha256Sum(b []byte) []byte {
sum := sha256.Sum256(b)
return sum[:]
}
+106 -9
View File
@@ -27,31 +27,128 @@ func NewNkeyAuthenticator(isAuthorized func(signPubHex string) bool) server.Auth
// Check verifies the client's nkey signature against the nonce the server // Check verifies the client's nkey signature against the nonce the server
// presented, then maps the nkey to its allowlist key and checks authorization. // presented, then maps the nkey to its allowlist key and checks authorization.
// Any malformed input or failed verification yields false (fail closed). The // Any malformed input or failed verification yields false (fail closed).
// signature decoding mirrors nats-server's own (raw-url base64, then std base64
// fallback) so genuine clients using nats.Nkey are accepted unchanged.
func (a *nkeyAuthenticator) Check(c server.ClientAuthentication) bool { func (a *nkeyAuthenticator) Check(c server.ClientAuthentication) bool {
signPubHex, ok := verifyNkey(c)
if !ok {
return false
}
return a.isAuthorized(signPubHex)
}
// verifyNkey performs the shared nkey verification: it checks the client's
// signature against the server-presented nonce and returns the lowercase-hex
// Ed25519 public key behind the nkey. ok is false on any malformed input or
// failed verification (fail closed). The signature decoding mirrors
// nats-server's own (raw-url base64, then std base64 fallback) so genuine
// clients using nats.Nkey are accepted unchanged.
func verifyNkey(c server.ClientAuthentication) (signPubHex string, ok bool) {
opts := c.GetOpts() opts := c.GetOpts()
if opts.Nkey == "" { if opts.Nkey == "" {
return false return "", false
} }
sig, err := base64.RawURLEncoding.DecodeString(opts.Sig) sig, err := base64.RawURLEncoding.DecodeString(opts.Sig)
if err != nil { if err != nil {
sig, err = base64.StdEncoding.DecodeString(opts.Sig) sig, err = base64.StdEncoding.DecodeString(opts.Sig)
if err != nil { if err != nil {
return false return "", false
} }
} }
pub, err := nkeys.FromPublicKey(opts.Nkey) pub, err := nkeys.FromPublicKey(opts.Nkey)
if err != nil { if err != nil {
return false return "", false
} }
if err := pub.Verify(c.GetNonce(), sig); err != nil { if err := pub.Verify(c.GetNonce(), sig); err != nil {
return false return "", false
} }
signPubHex, err := SignPubHexFromNkey(opts.Nkey) signPubHex, err = SignPubHexFromNkey(opts.Nkey)
if err != nil { if err != nil {
return "", false
}
return signPubHex, true
}
// PermissionsFunc maps a connecting identity (lowercase-hex Ed25519 signing key)
// to the NATS permissions it should be granted for this connection. Returning an
// error denies the connection (fail closed). It is how the data plane enforces
// per-subject access from room membership (issue 0003e, audit H4 residual).
type PermissionsFunc func(signPubHex string) (*server.Permissions, error)
// nkeyAuthenticatorACL is the nkey authenticator that ALSO scopes the connection
// to per-subject permissions derived from room membership. NATS evaluates
// permissions once, at connect time, so a peer that joins a room after
// connecting must reconnect (client.RefreshSession) to gain that room's subject
// — the dynamic-membership reconnection model the audit deferred to this issue.
type nkeyAuthenticatorACL struct {
isAuthorized func(signPubHex string) bool
perms PermissionsFunc
// internalPubHex is the lowercase-hex Ed25519 public key of membershipd's own
// ephemeral internal service identity. A connection that proves that key is
// granted full permissions WITHOUT consulting the allowlist, so the service can
// bootstrap and manage JetStream (the replicated nonce bucket and, when
// decentralized, the control-plane KV buckets) against its own embedded server
// even while the data plane confines every client to its rooms. Empty disables
// the internal-identity path entirely (behavior identical to a plain ACL
// authenticator).
internalPubHex string
}
// NewNkeyAuthenticatorACL builds an authenticator that authorizes by the bus
// allowlist AND registers per-subject permissions from perms. A registered but
// permission-less peer can no longer subscribe to or publish on arbitrary
// subjects: it is confined to the subjects of the rooms it belongs to (plus the
// client infrastructure subjects perms includes). This is the per-subject ACL
// the 0004 hardening left as a residual.
func NewNkeyAuthenticatorACL(isAuthorized func(signPubHex string) bool, perms PermissionsFunc) server.Authentication {
return &nkeyAuthenticatorACL{isAuthorized: isAuthorized, perms: perms}
}
// NewNkeyAuthenticatorACLInternal is NewNkeyAuthenticatorACL that also recognizes
// membershipd's internal service identity (internalPubHex, the lowercase hex of
// its ephemeral Ed25519 public key): a connection proving that key is granted
// full permissions without an allowlist lookup, so the service can create and
// manage JetStream against its own embedded server under enforce (issue 0006a/c —
// the replicated nonce bucket and the control-plane KV). Every other identity
// goes through the allowlist + per-subject ACL unchanged. An empty internalPubHex
// is identical to NewNkeyAuthenticatorACL, so this is a superset and safe to use
// everywhere the plain constructor was used.
func NewNkeyAuthenticatorACLInternal(isAuthorized func(signPubHex string) bool, perms PermissionsFunc, internalPubHex string) server.Authentication {
return &nkeyAuthenticatorACL{isAuthorized: isAuthorized, perms: perms, internalPubHex: internalPubHex}
}
// fullPermissions grants publish and subscribe on every subject (">"). It is the
// permission set for membershipd's own internal service connection, which must
// manage the JetStream control plane (nonce bucket + KV buckets) over NATS. It is
// NEVER granted to a bus user — only to the process's own ephemeral internal
// identity, recognized by exact public-key match in Check.
func fullPermissions() *server.Permissions {
sp := &server.SubjectPermission{Allow: []string{">"}}
return &server.Permissions{Publish: sp, Subscribe: sp}
}
// Check verifies the nkey, authorizes against the allowlist, then derives and
// registers the connection's subject permissions. A permissions-derivation
// error denies the connection (fail closed) rather than granting open access.
func (a *nkeyAuthenticatorACL) Check(c server.ClientAuthentication) bool {
signPubHex, ok := verifyNkey(c)
if !ok {
return false return false
} }
return a.isAuthorized(signPubHex) // membershipd's own internal service identity bypasses the allowlist and is
// granted full permissions so the service can bootstrap JetStream under
// enforce. The key is matched exactly against the cryptographically verified
// connecting key, so no other identity can claim these permissions.
if a.internalPubHex != "" && signPubHex == a.internalPubHex {
c.RegisterUser(&server.User{Permissions: fullPermissions()})
return true
}
if !a.isAuthorized(signPubHex) {
return false
}
perms, err := a.perms(signPubHex)
if err != nil {
return false // fail closed: never grant open access on a derivation error
}
c.RegisterUser(&server.User{Permissions: perms})
return true
} }
+26
View File
@@ -0,0 +1,26 @@
package busauth
import server "github.com/nats-io/nats-server/v2/server"
// PermissionsFromSubjects adapts a subject-deriving function (e.g.
// membership.SubjectACLFor, which maps an identity to the subjects of the rooms
// it belongs to plus the client infrastructure subjects) into the PermissionsFunc
// the ACL authenticator expects. The derived subjects are granted as BOTH the
// publish and subscribe allow set, so a connection can only pub/sub on the
// subjects it is entitled to. A derivation error is propagated so the caller
// fails closed (denies the connection) rather than granting open access.
//
// This is the production wiring for the per-subject data-plane ACL (issue 0003e,
// audit H4): membershipd passes PermissionsFromSubjects(membership.SubjectACLFor(
// store)) to NewNkeyAuthenticatorACL. It lives in busauth (not membership) so the
// membership package stays free of the nats-server dependency.
func PermissionsFromSubjects(derive func(signPubHex string) ([]string, error)) PermissionsFunc {
return func(signPubHex string) (*server.Permissions, error) {
subjects, err := derive(signPubHex)
if err != nil {
return nil, err
}
sp := &server.SubjectPermission{Allow: subjects}
return &server.Permissions{Publish: sp, Subscribe: sp}, nil
}
}
+38
View File
@@ -35,3 +35,41 @@ func ServerTLSConfig(certPEMPath, keyPEMPath string) (*tls.Config, error) {
} }
return &tls.Config{Certificates: []tls.Certificate{cert}, MinVersion: tls.VersionTLS12}, nil return &tls.Config{Certificates: []tls.Certificate{cert}, MinVersion: tls.VersionTLS12}, nil
} }
// RouteTLSConfig builds the mutual-TLS config for the NATS CLUSTER route layer
// (issue 0003a). Unlike the client data plane, where the server presents a cert
// and only the client verifies it, routes are server-to-server: each node both
// presents its own node certificate AND verifies the connecting node's
// certificate against the bus CA. So this single config carries:
//
// - Certificates: this node's CA-signed certificate (presented in both the
// server and the client role of a route handshake),
// - RootCAs: the bus CA, to verify the certificate of a node we dial out to,
// - ClientCAs + ClientAuth=RequireAndVerifyClientCert: the bus CA, to verify
// the certificate of a node dialing in.
//
// The effect: a node that lacks a certificate signed by the bus CA cannot
// establish a route in either direction, even if it knows the cluster password.
// Reuse the same CA as the client data plane (deploy/tls) but a per-node cert
// whose SAN covers that node's route address.
func RouteTLSConfig(certPEMPath, keyPEMPath, caPEMPath string) (*tls.Config, error) {
cert, err := tls.LoadX509KeyPair(certPEMPath, keyPEMPath)
if err != nil {
return nil, fmt.Errorf("busauth: load route keypair: %w", err)
}
pem, err := os.ReadFile(caPEMPath)
if err != nil {
return nil, fmt.Errorf("busauth: read route CA %q: %w", caPEMPath, err)
}
pool := x509.NewCertPool()
if !pool.AppendCertsFromPEM(pem) {
return nil, fmt.Errorf("busauth: route CA %q contains no valid PEM certificate", caPEMPath)
}
return &tls.Config{
Certificates: []tls.Certificate{cert},
RootCAs: pool,
ClientCAs: pool,
ClientAuth: tls.RequireAndVerifyClientCert,
MinVersion: tls.VersionTLS12,
}, nil
}
+266 -77
View File
@@ -51,9 +51,14 @@ type Client struct {
endpoint string endpoint string
nc *nats.Conn nc *nats.Conn
js jetstream.JetStream // durable plane for rooms with Policy.Persist js jetstream.JetStream // durable plane for rooms with Policy.Persist
ctrlURL string ctrlURLs []string // control-plane HTTP endpoints, tried in order (failover)
http *http.Client http *http.Client
// natsServers + natsOpts are retained so RefreshSession can rebuild the
// data-plane connection (re-triggering the server's subject-ACL evaluation).
natsServers []string
natsOpts []nats.Option
mu sync.RWMutex mu sync.RWMutex
keyCache map[string]map[int][]byte // roomID -> epoch -> K keyCache map[string]map[int][]byte // roomID -> epoch -> K
signCache map[string][]byte // sender endpoint -> sign pub (for verification) signCache map[string][]byte // sender endpoint -> sign pub (for verification)
@@ -77,6 +82,33 @@ type Options struct {
// secured independently (a test may TLS one and not the other); production // secured independently (a test may TLS one and not the other); production
// sets both to the same CA via Connect. Nil keeps the control plane plaintext. // sets both to the same CA via Connect. Nil keeps the control plane plaintext.
CtrlTLS *tls.Config CtrlTLS *tls.Config
// NatsServers are ADDITIONAL NATS seed URLs for cluster failover (issue
// 0003e), beyond the primary natsURL passed to the constructor. With more
// than one server nats.go reconnects to a surviving node automatically when
// the one a client is attached to dies, so a node loss is transparent.
NatsServers []string
// CtrlURLs are ADDITIONAL control-plane HTTP endpoints (one per node) beyond
// the primary ctrlURL. Each request is tried against them in order until one
// answers, so the control plane survives a node loss too. With the
// decentralized KV store every node serves the same state, so any of them
// can answer any request.
CtrlURLs []string
}
// dedupNonEmpty returns the input with empty strings dropped and duplicates
// removed, preserving order. Used to build the NATS seed list and control-plane
// list from a primary URL plus optional extras without a redundant entry.
func dedupNonEmpty(in []string) []string {
seen := map[string]bool{}
var out []string
for _, s := range in {
if s == "" || seen[s] {
continue
}
seen[s] = true
out = append(out, s)
}
return out
} }
// New connects to NATS and records the control-plane URL with default Options // New connects to NATS and records the control-plane URL with default Options
@@ -116,7 +148,20 @@ func Connect(natsURL, ctrlURL string, id cs.Identity, caPath string) (*Client, e
// so every peer (worker, chat, mobile, gateway) gets identical behavior by // so every peer (worker, chat, mobile, gateway) gets identical behavior by
// passing the same Options. // passing the same Options.
func NewWithOptions(natsURL, ctrlURL string, id cs.Identity, opts Options) (*Client, error) { func NewWithOptions(natsURL, ctrlURL string, id cs.Identity, opts Options) (*Client, error) {
natsOpts := []nats.Option{nats.Name("unibus-client")} // Seed list = primary + extras. With more than one seed, nats.go fails over
// to a surviving node on disconnect; MaxReconnects(-1) keeps it retrying
// indefinitely so a node coming back is rejoined rather than given up on.
natsServers := dedupNonEmpty(append([]string{natsURL}, opts.NatsServers...))
natsOpts := []nats.Option{
nats.Name("unibus-client"),
nats.MaxReconnects(-1),
nats.ReconnectWait(250 * time.Millisecond),
}
if len(natsServers) > 1 {
// Try every seed on the initial connect too, so startup tolerates one
// seed being down.
natsOpts = append(natsOpts, nats.RetryOnFailedConnect(true))
}
if opts.UseNkey { if opts.UseNkey {
nkeyPub, nkeySign, err := busauth.ClientNkey(id.SignPriv) nkeyPub, nkeySign, err := busauth.ClientNkey(id.SignPriv)
if err != nil { if err != nil {
@@ -127,9 +172,9 @@ func NewWithOptions(natsURL, ctrlURL string, id cs.Identity, opts Options) (*Cli
if opts.TLS != nil { if opts.TLS != nil {
natsOpts = append(natsOpts, nats.Secure(opts.TLS)) natsOpts = append(natsOpts, nats.Secure(opts.TLS))
} }
nc, err := nats.Connect(natsURL, natsOpts...) nc, err := nats.Connect(strings.Join(natsServers, ","), natsOpts...)
if err != nil { if err != nil {
return nil, fmt.Errorf("client: connect nats %q: %w", natsURL, err) return nil, fmt.Errorf("client: connect nats %v: %w", natsServers, err)
} }
// JetStream context for the durable plane. Obtaining it does not require any // JetStream context for the durable plane. Obtaining it does not require any
// stream to exist yet and has no effect on cleartext/ephemeral rooms — those // stream to exist yet and has no effect on cleartext/ephemeral rooms — those
@@ -147,17 +192,50 @@ func NewWithOptions(natsURL, ctrlURL string, id cs.Identity, opts Options) (*Cli
httpClient.Transport = &http.Transport{TLSClientConfig: opts.CtrlTLS.Clone()} httpClient.Transport = &http.Transport{TLSClientConfig: opts.CtrlTLS.Clone()}
} }
return &Client{ return &Client{
id: id, id: id,
endpoint: frame.EndpointID(id.SignPub), endpoint: frame.EndpointID(id.SignPub),
nc: nc, nc: nc,
js: js, js: js,
ctrlURL: ctrlURL, ctrlURLs: dedupNonEmpty(append([]string{ctrlURL}, opts.CtrlURLs...)),
http: httpClient, http: httpClient,
keyCache: map[string]map[int][]byte{}, natsServers: natsServers,
signCache: map[string][]byte{}, natsOpts: natsOpts,
keyCache: map[string]map[int][]byte{},
signCache: map[string][]byte{},
}, nil }, nil
} }
// RefreshSession rebuilds the data-plane NATS connection so the server's
// subject-ACL authenticator re-evaluates this peer's room membership (issue
// 0003e, audit H4 residual). Call it after a membership change — a room you
// created, were invited to, or joined — when the bus enforces per-subject
// permissions, so the new room's subject becomes publishable and subscribable
// (NATS freezes permissions at connect time, so the prior connection cannot see
// the new room).
//
// It opens a fresh connection with the same seeds/options and swaps it in.
// IMPORTANT: active subscriptions from the previous connection are dropped —
// re-subscribe (client.Subscribe) to your rooms after calling this. The key and
// signer caches are preserved. On a non-ACL bus this is a no-op-safe reconnect.
func (c *Client) RefreshSession() error {
nc, err := nats.Connect(strings.Join(c.natsServers, ","), c.natsOpts...)
if err != nil {
return fmt.Errorf("client: refresh session: reconnect nats: %w", err)
}
js, err := jetstream.New(nc)
if err != nil {
nc.Close()
return fmt.Errorf("client: refresh session: init jetstream: %w", err)
}
old := c.nc
c.mu.Lock()
c.nc = nc
c.js = js
c.mu.Unlock()
old.Close()
return nil
}
// Endpoint returns this client's public identity. // Endpoint returns this client's public identity.
func (c *Client) Endpoint() Endpoint { func (c *Client) Endpoint() Endpoint {
return Endpoint{ID: c.endpoint, SignPub: c.id.SignPub, KexPub: c.id.KexPub} return Endpoint{ID: c.endpoint, SignPub: c.id.SignPub, KexPub: c.id.KexPub}
@@ -169,6 +247,15 @@ func (c *Client) Close() error {
return nil return nil
} }
// ConnectedServer returns the URL of the NATS node this client is currently
// attached to (empty when disconnected). It is observability for cluster
// failover: after a node dies, this reports the surviving node nats.go
// reconnected to. IsConnected reports whether the data-plane link is up.
func (c *Client) ConnectedServer() string { return c.nc.ConnectedUrl() }
// IsConnected reports whether the NATS data-plane connection is currently up.
func (c *Client) IsConnected() bool { return c.nc.IsConnected() }
// ---- key cache ------------------------------------------------------------ // ---- key cache ------------------------------------------------------------
func (c *Client) cacheKey(roomID string, epoch int, k []byte) { func (c *Client) cacheKey(roomID string, epoch int, k []byte) {
@@ -203,36 +290,45 @@ func (c *Client) doJSON(method, path string, body, out any) error {
} }
bodyBytes = b bodyBytes = b
} }
req, err := c.newSignedRequest(method, path, bodyBytes) // Try each control-plane endpoint in order. A transport error (a dead node)
if err != nil { // falls over to the next; an HTTP response (any status) is authoritative and
return err // returned, since every node serves the same state. Each attempt is freshly
} // signed (new nonce), so a failed-over retry is never seen as a replay.
if body != nil { var lastErr error
req.Header.Set("Content-Type", "application/json") for _, base := range c.ctrlURLs {
} req, err := c.newSignedRequestTo(base, method, path, bodyBytes)
resp, err := c.http.Do(req) if err != nil {
if err != nil { return err
return fmt.Errorf("client: do %s %s: %w", method, path, err)
}
defer resp.Body.Close()
respBody, _ := io.ReadAll(resp.Body)
if resp.StatusCode >= 300 {
// Surface the server's structured {"error": "..."} message when present,
// instead of leaking the raw HTTP envelope (method, path, status, JSON body).
var er struct {
Error string `json:"error"`
} }
if json.Unmarshal(respBody, &er) == nil && er.Error != "" { if body != nil {
return fmt.Errorf("%s (HTTP %d)", er.Error, resp.StatusCode) req.Header.Set("Content-Type", "application/json")
} }
return fmt.Errorf("client: %s %s -> %d: %s", method, path, resp.StatusCode, string(respBody)) resp, err := c.http.Do(req)
} if err != nil {
if out != nil { lastErr = err
if err := json.Unmarshal(respBody, out); err != nil { continue // dead node: try the next control plane
return fmt.Errorf("client: decode response: %w", err)
} }
defer resp.Body.Close()
respBody, _ := io.ReadAll(resp.Body)
if resp.StatusCode >= 300 {
// Surface the server's structured {"error": "..."} message when present,
// instead of leaking the raw HTTP envelope (method, path, status, body).
var er struct {
Error string `json:"error"`
}
if json.Unmarshal(respBody, &er) == nil && er.Error != "" {
return fmt.Errorf("%s (HTTP %d)", er.Error, resp.StatusCode)
}
return fmt.Errorf("client: %s %s -> %d: %s", method, path, resp.StatusCode, string(respBody))
}
if out != nil {
if err := json.Unmarshal(respBody, out); err != nil {
return fmt.Errorf("client: decode response: %w", err)
}
}
return nil
} }
return nil return fmt.Errorf("client: %s %s: all control planes failed: %w", method, path, lastErr)
} }
// signRequest signs the canonical bytes of req (req must already have its Sig // signRequest signs the canonical bytes of req (req must already have its Sig
@@ -246,22 +342,25 @@ func (c *Client) signRequest(req any) []byte {
return cs.SignEd25519(c.id.SignPriv, b) return cs.SignEd25519(c.id.SignPriv, b)
} }
// newSignedRequest builds an *http.Request to the control plane and attaches the // newSignedRequestTo builds an *http.Request to the control-plane endpoint
// transport authentication headers (X-Unibus-Pub/Ts/Nonce/Sig) signing the // `base` and attaches the transport authentication headers
// canonical request bytes with this peer's Ed25519 key. path is the request URI // (X-Unibus-Pub/Ts/Nonce/Sig) signing the canonical request bytes with this
// (path plus any query); body is the raw request body (nil for GET). The server // peer's Ed25519 key. path is the request URI (path plus any query); body is the
// (membership.authenticate) verifies these headers under the bus-auth flag. // raw request body (nil for GET). The server (membership.authenticate) verifies
// these headers under the bus-auth flag. The signature covers method+path+ts+
// nonce+sha256(body), NOT the host, so the same request can be addressed to any
// node — and each failover attempt mints a fresh nonce so it is never a replay.
// //
// Signing happens on every request — including GETs — so that under enforce the // Signing happens on every request — including GETs — so that under enforce the
// server can authenticate the caller and reject unregistered or revoked // server can authenticate the caller and reject unregistered or revoked
// identities uniformly. The canonical construction is the single source of truth // identities uniformly. The canonical construction is the single source of truth
// in membership.CanonicalRequest, shared by both sides. // in membership.CanonicalRequest, shared by both sides.
func (c *Client) newSignedRequest(method, path string, body []byte) (*http.Request, error) { func (c *Client) newSignedRequestTo(base, method, path string, body []byte) (*http.Request, error) {
var rdr io.Reader var rdr io.Reader
if body != nil { if body != nil {
rdr = bytes.NewReader(body) rdr = bytes.NewReader(body)
} }
req, err := http.NewRequest(method, c.ctrlURL+path, rdr) req, err := http.NewRequest(method, base+path, rdr)
if err != nil { if err != nil {
return nil, fmt.Errorf("client: new request: %w", err) return nil, fmt.Errorf("client: new request: %w", err)
} }
@@ -357,6 +456,23 @@ type memberRoomJSON struct {
Role string `json:"role"` Role string `json:"role"`
} }
// userJSON mirrors the server's wire type on the admin user-management endpoints.
type userJSON struct {
SignPub string `json:"sign_pub"`
Handle string `json:"handle"`
Role string `json:"role"`
Status string `json:"status"`
CreatedAt string `json:"created_at"`
RevokedAt string `json:"revoked_at,omitempty"`
}
// addUserReq is the POST /users body (mirror of the server type).
type addUserReq struct {
SignPub string `json:"sign_pub"`
Handle string `json:"handle"`
Role string `json:"role"`
}
// ---- room operations ------------------------------------------------------ // ---- room operations ------------------------------------------------------
// RoomRef is a room this peer belongs to, returned by ListMyRooms. It is the // RoomRef is a room this peer belongs to, returned by ListMyRooms. It is the
@@ -391,6 +507,59 @@ func (c *Client) ListMyRooms() ([]RoomRef, error) {
return out, nil return out, nil
} }
// ---- user administration (admin-only) ------------------------------------
// UserInfo is a bus user as returned by the admin user-management endpoints. It
// is a flat view (no nested types) for the admin panel: the signing key
// (lowercase hex), handle, role ("admin"|"member"), status ("active"|"revoked"),
// and timestamps. RevokedAt is empty for an active user.
type UserInfo struct {
SignPub string
Handle string
Role string
Status string
CreatedAt string
RevokedAt string
}
// ListUsers returns the full bus allowlist, including revoked users. The caller
// must be signing as an admin: a non-admin signer is rejected by the server with
// 403, surfaced here as an error.
func (c *Client) ListUsers() ([]UserInfo, error) {
var resp []userJSON
if err := c.doJSON("GET", "/users", nil, &resp); err != nil {
return nil, err
}
out := make([]UserInfo, 0, len(resp))
for _, u := range resp {
out = append(out, UserInfo{
SignPub: u.SignPub,
Handle: u.Handle,
Role: u.Role,
Status: u.Status,
CreatedAt: u.CreatedAt,
RevokedAt: u.RevokedAt,
})
}
return out, nil
}
// AddUser registers a bus user from their Ed25519 signing public key (64-hex).
// role is "admin" or "member" (empty defaults to member, matching the server).
// The caller must be signing as an admin. Re-adding an already-registered key
// returns an error (the server replies 409 and leaves the existing row
// untouched — no silent role/status change).
func (c *Client) AddUser(signPub, handle, role string) error {
return c.doJSON("POST", "/users", addUserReq{SignPub: signPub, Handle: handle, Role: role}, nil)
}
// RevokeUser revokes a bus user by their signing public key (64-hex). Revocation
// is a status flip (no hard delete): the identity stays auditable and is denied
// on both planes immediately. The caller must be signing as an admin.
func (c *Client) RevokeUser(signPub string) error {
return c.doJSON("POST", "/users/"+signPub+"/revoke", nil, nil)
}
// newRoomKey returns 32 random bytes for a symmetric room key. // newRoomKey returns 32 random bytes for a symmetric room key.
func newRoomKey() ([]byte, error) { func newRoomKey() ([]byte, error) {
k := make([]byte, 32) k := make([]byte, 32)
@@ -700,7 +869,17 @@ func (c *Client) processFrame(roomID string, info roomView, data []byte, handler
if err != nil { if err != nil {
return return
} }
if info.Policy.SignMsgs && f.Sig != nil { // A room with SignMsgs REQUIRES a signature, so an unsigned frame is
// unauthenticated and must be dropped — not silently accepted. The previous
// `&& f.Sig != nil` guard verified the signature only when one was present, so
// an attacker with data-plane access could publish a frame with Sig==nil and a
// forged Sender and have the receiver accept it as authentic in a room that
// demands signatures (audit N3, report 0006). Requiring the signature first
// closes that spoof.
if info.Policy.SignMsgs {
if f.Sig == nil {
return // signature required by room policy but absent: drop
}
pub, err := c.signerPub(roomID, f.Sender) pub, err := c.signerPub(roomID, f.Sender)
if err != nil || !cs.VerifyEd25519(pub, f.SigningBytes(), f.Sig) { if err != nil || !cs.VerifyEd25519(pub, f.SigningBytes(), f.Sig) {
return // unauthenticated frame: drop return // unauthenticated frame: drop
@@ -887,40 +1066,50 @@ func (c *Client) FetchMedia(roomID string, f frame.Frame) ([]byte, error) {
} }
func (c *Client) putBlob(ciphertext []byte) (string, error) { func (c *Client) putBlob(ciphertext []byte) (string, error) {
req, err := c.newSignedRequest("POST", "/blobs", ciphertext) var lastErr error
if err != nil { for _, base := range c.ctrlURLs {
return "", err req, err := c.newSignedRequestTo(base, "POST", "/blobs", ciphertext)
if err != nil {
return "", err
}
req.Header.Set("Content-Type", "application/octet-stream")
resp, err := c.http.Do(req)
if err != nil {
lastErr = err
continue // dead node: try the next control plane
}
defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body)
if resp.StatusCode >= 300 {
return "", fmt.Errorf("client: put blob -> %d: %s", resp.StatusCode, string(body))
}
var r blobResp
if err := json.Unmarshal(body, &r); err != nil {
return "", fmt.Errorf("client: decode blob resp: %w", err)
}
return r.Hash, nil
} }
req.Header.Set("Content-Type", "application/octet-stream") return "", fmt.Errorf("client: put blob: all control planes failed: %w", lastErr)
resp, err := c.http.Do(req)
if err != nil {
return "", fmt.Errorf("client: put blob: %w", err)
}
defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body)
if resp.StatusCode >= 300 {
return "", fmt.Errorf("client: put blob -> %d: %s", resp.StatusCode, string(body))
}
var r blobResp
if err := json.Unmarshal(body, &r); err != nil {
return "", fmt.Errorf("client: decode blob resp: %w", err)
}
return r.Hash, nil
} }
func (c *Client) getBlob(hash string) ([]byte, error) { func (c *Client) getBlob(hash string) ([]byte, error) {
req, err := c.newSignedRequest("GET", "/blobs/"+hash, nil) var lastErr error
if err != nil { for _, base := range c.ctrlURLs {
return nil, err req, err := c.newSignedRequestTo(base, "GET", "/blobs/"+hash, nil)
if err != nil {
return nil, err
}
resp, err := c.http.Do(req)
if err != nil {
lastErr = err
continue // dead node: try the next control plane
}
defer resp.Body.Close()
if resp.StatusCode >= 300 {
body, _ := io.ReadAll(resp.Body)
return nil, fmt.Errorf("client: get blob -> %d: %s", resp.StatusCode, string(body))
}
return io.ReadAll(resp.Body)
} }
resp, err := c.http.Do(req) return nil, fmt.Errorf("client: get blob: all control planes failed: %w", lastErr)
if err != nil {
return nil, fmt.Errorf("client: get blob: %w", err)
}
defer resp.Body.Close()
if resp.StatusCode >= 300 {
body, _ := io.ReadAll(resp.Body)
return nil, fmt.Errorf("client: get blob -> %d: %s", resp.StatusCode, string(body))
}
return io.ReadAll(resp.Body)
} }
+1 -1
View File
@@ -31,7 +31,7 @@ type testHarness struct {
ctrlURL string ctrlURL string
ns *server.Server ns *server.Server
httpts *httptest.Server httpts *httptest.Server
store *membership.Store store membership.Store
srv *membership.Server srv *membership.Server
} }
+185
View File
@@ -0,0 +1,185 @@
package client_test
import (
"fmt"
"net/http/httptest"
"path/filepath"
"strconv"
"strings"
"sync"
"testing"
"time"
"github.com/enmanuel/unibus/pkg/blobstore"
"github.com/enmanuel/unibus/pkg/client"
"github.com/enmanuel/unibus/pkg/embeddednats"
"github.com/enmanuel/unibus/pkg/frame"
"github.com/enmanuel/unibus/pkg/membership"
"github.com/enmanuel/unibus/pkg/room"
server "github.com/nats-io/nats-server/v2/server"
)
// startClusterNode boots a clustered embedded NATS node (auth off, no route TLS:
// this test exercises client failover, not route security — that is covered in
// pkg/embeddednats).
func startClusterNode(t *testing.T, name string, clientPort, routePort int, peerRoutePorts []int) *server.Server {
t.Helper()
routes := make([]string, 0, len(peerRoutePorts))
for _, p := range peerRoutePorts {
routes = append(routes, fmt.Sprintf("nats://127.0.0.1:%d", p))
}
ns, err := embeddednats.StartServer(embeddednats.ServerConfig{
StoreDir: t.TempDir(),
Host: "127.0.0.1",
Port: clientPort,
ServerName: name,
Cluster: &embeddednats.ClusterConfig{Name: "unibus-failover", Host: "127.0.0.1", Port: routePort, Routes: routes},
})
if err != nil {
t.Fatalf("start node %s: %v", name, err)
}
t.Cleanup(func() { ns.Shutdown(); ns.WaitForShutdown() })
return ns
}
func waitClusterRoutes(t *testing.T, ns *server.Server) {
t.Helper()
deadline := time.Now().Add(8 * time.Second)
for time.Now().Before(deadline) {
if ns.NumRoutes() >= 1 {
return
}
time.Sleep(50 * time.Millisecond)
}
t.Fatalf("node %q never formed a route", ns.Name())
}
// portOf extracts the :port of a nats URL for matching ConnectedServer() (which
// may report a different host spelling than ClientURL()).
func portOf(natsURL string) string {
i := strings.LastIndex(natsURL, ":")
if i < 0 {
return ""
}
return natsURL[i+1:]
}
// TestClientFailoverAcrossNodes is the issue's edge case: a client connected to
// node A keeps its session when A is killed — nats.go reconnects it to node B
// and it keeps receiving messages published on the surviving node.
func TestClientFailoverAcrossNodes(t *testing.T) {
rp0, rp1 := freePort(t), freePort(t)
p0, p1 := freePort(t), freePort(t)
n0 := startClusterNode(t, "n0", p0, rp0, []int{rp1})
n1 := startClusterNode(t, "n1", p1, rp1, []int{rp0})
waitClusterRoutes(t, n0)
waitClusterRoutes(t, n1)
nodes := map[string]*server.Server{strconv.Itoa(p0): n0, strconv.Itoa(p1): n1}
// Control plane: one in-process membershipd (metadata only; the data plane is
// the NATS cluster). Auth off keeps the test focused on data-plane failover.
dir := t.TempDir()
store, err := membership.Open(filepath.Join(dir, "unibus.db"))
if err != nil {
t.Fatalf("store: %v", err)
}
t.Cleanup(func() { store.Close() })
blobs, err := blobstore.New(filepath.Join(dir, "blobs"))
if err != nil {
t.Fatalf("blobs: %v", err)
}
ctrl := httptest.NewServer(membership.NewServer(store, blobs, membership.AuthOff))
t.Cleanup(ctrl.Close)
url0 := n0.ClientURL()
url1 := n1.ClientURL()
// A seeds BOTH nodes (failover list); B connects directly to n1.
a, err := client.NewWithOptions(url0, ctrl.URL, mustIdentity(t), client.Options{NatsServers: []string{url1}})
if err != nil {
t.Fatalf("connect A: %v", err)
}
defer a.Close()
b, err := client.NewWithOptions(url1, ctrl.URL, mustIdentity(t), client.Options{NatsServers: []string{url0}})
if err != nil {
t.Fatalf("connect B: %v", err)
}
defer b.Close()
roomID, err := a.CreateRoom("room.failover", room.ModeNATS)
if err != nil {
t.Fatalf("A create room: %v", err)
}
var mu sync.Mutex
var got []string
sub, err := a.Subscribe(roomID, func(_ frame.Frame, plaintext []byte) {
mu.Lock()
got = append(got, string(plaintext))
mu.Unlock()
})
if err != nil {
t.Fatalf("A subscribe: %v", err)
}
defer sub.Unsubscribe()
time.Sleep(200 * time.Millisecond)
// Pre-kill sanity: B publishes, A receives across the cluster.
if err := b.Publish(roomID, []byte("before-kill")); err != nil {
t.Fatalf("B publish 1: %v", err)
}
if !waitFor(&mu, &got, func(rs []string) bool { return contains(rs, "before-kill") }, 3*time.Second) {
t.Fatalf("A did not receive the pre-kill message; got %v", snapshot(&mu, &got))
}
// Identify and KILL the node A is attached to, forcing a reconnect.
attached := a.ConnectedServer()
killPort := portOf(attached)
victim, ok := nodes[killPort]
if !ok {
t.Fatalf("A is attached to an unknown node %q (port %q)", attached, killPort)
}
survivorURL := url1
if killPort == strconv.Itoa(p1) {
survivorURL = url0
}
victim.Shutdown()
victim.WaitForShutdown()
// A must reconnect to the surviving node.
deadline := time.Now().Add(8 * time.Second)
for time.Now().Before(deadline) {
if a.IsConnected() && portOf(a.ConnectedServer()) == portOf(survivorURL) {
break
}
time.Sleep(100 * time.Millisecond)
}
if !a.IsConnected() || portOf(a.ConnectedServer()) != portOf(survivorURL) {
t.Fatalf("A did not fail over to the surviving node (now on %q, want port %s)", a.ConnectedServer(), portOf(survivorURL))
}
// Make B publish from the surviving node and confirm A still receives —
// the session (its subscription) survived the failover.
if survivorURL == url0 {
// B's primary was n1 (killed); ensure B is on the survivor too.
deadline := time.Now().Add(8 * time.Second)
for time.Now().Before(deadline) && portOf(b.ConnectedServer()) != portOf(survivorURL) {
time.Sleep(100 * time.Millisecond)
}
}
if err := b.Publish(roomID, []byte("after-kill")); err != nil {
t.Fatalf("B publish 2: %v", err)
}
if !waitFor(&mu, &got, func(rs []string) bool { return contains(rs, "after-kill") }, 6*time.Second) {
t.Fatalf("A did not receive a message after failover; got %v", snapshot(&mu, &got))
}
}
func contains(rs []string, want string) bool {
for _, r := range rs {
if r == want {
return true
}
}
return false
}
+27 -11
View File
@@ -33,20 +33,36 @@ type identityFile struct {
KexPriv string `json:"kex_priv"` KexPriv string `json:"kex_priv"`
} }
// LoadIdentity loads an existing identity from path. Unlike LoadOrCreateIdentity
// it NEVER creates one: a missing or unreadable file is an error. It is for
// callers that must consume a specific, pre-provisioned identity rather than mint
// a fresh one — for example membershipd's persisted internal service identity,
// which `membershipd user add --store kv` reads to present the privileged nkey
// the cluster authenticator recognizes.
func LoadIdentity(path string) (cs.Identity, error) {
data, err := os.ReadFile(path)
if err != nil {
return cs.Identity{}, fmt.Errorf("client: read identity %q: %w", path, err)
}
var f identityFile
if err := json.Unmarshal(data, &f); err != nil {
return cs.Identity{}, fmt.Errorf("client: parse identity %q: %w", path, err)
}
id, err := f.toIdentity()
if err != nil {
return cs.Identity{}, fmt.Errorf("client: decode identity %q: %w", path, err)
}
return id, nil
}
// LoadOrCreateIdentity loads the identity at path, or generates and persists a // LoadOrCreateIdentity loads the identity at path, or generates and persists a
// new one if the file does not exist. The file is written with 0600 // new one if the file does not exist. The file is written with 0600
// permissions because it holds private keys. // permissions because it holds private keys. A file that exists but is
// unreadable or corrupt is an error (NOT silently regenerated), so a damaged
// identity surfaces instead of minting a new key that cannot decrypt old data.
func LoadOrCreateIdentity(path string) (cs.Identity, error) { func LoadOrCreateIdentity(path string) (cs.Identity, error) {
if data, err := os.ReadFile(path); err == nil { if _, statErr := os.Stat(path); statErr == nil {
var f identityFile return LoadIdentity(path)
if err := json.Unmarshal(data, &f); err != nil {
return cs.Identity{}, fmt.Errorf("client: parse identity %q: %w", path, err)
}
id, err := f.toIdentity()
if err != nil {
return cs.Identity{}, fmt.Errorf("client: decode identity %q: %w", path, err)
}
return id, nil
} }
id, err := cs.GenerateIdentity() id, err := cs.GenerateIdentity()
+154
View File
@@ -0,0 +1,154 @@
package client_test
import (
"sync"
"testing"
"time"
"github.com/enmanuel/unibus/pkg/client"
"github.com/enmanuel/unibus/pkg/frame"
"github.com/enmanuel/unibus/pkg/room"
"github.com/nats-io/nats.go"
)
// TestReaudit_SigNilSpoof ports the re-auditor's N3 (Alto) finding: in a room
// that REQUIRES per-message signatures, an attacker with data-plane access
// publishes a raw frame with Sig==nil and a forged Sender. Before the fix
// processFrame verified the signature only when one was present
// (`SignMsgs && f.Sig != nil`), so the receiver accepted the unsigned, forged
// frame as authentic. The fix drops any unsigned frame in a SignMsgs room.
//
// Coverage:
// - golden: a properly signed frame from a real member IS delivered;
// - error : an unsigned frame with a forged Sender in a SignMsgs room is DROPPED;
// - edge : a room WITHOUT SignMsgs still delivers an unsigned frame (the drop
// is specific to signed rooms, not a blanket reject of unsigned frames).
func TestReaudit_SigNilSpoof(t *testing.T) {
h := newHarness(t)
waitHealth(t, h.ctrlURL)
alice, err := client.New(h.natsURL, h.ctrlURL, mustIdentity(t))
if err != nil {
t.Fatalf("connect alice: %v", err)
}
defer alice.Close()
bob, err := client.New(h.natsURL, h.ctrlURL, mustIdentity(t))
if err != nil {
t.Fatalf("connect bob: %v", err)
}
defer bob.Close()
// A signed-but-NOT-encrypted room: SignMsgs enforces authorship, and the lack
// of encryption is exactly the case the auditor flagged as Alto (any peer with
// the subject can forge a sender if signatures are not strictly required).
const subject = "room.signed.spoof"
signedPolicy := room.Policy{Encrypt: false, Persist: false, SignMsgs: true}
roomID, err := alice.CreateRoom(subject, signedPolicy)
if err != nil {
t.Fatalf("alice create signed room: %v", err)
}
if err := alice.Invite(roomID, bob.Endpoint()); err != nil {
t.Fatalf("alice invite bob: %v", err)
}
if err := bob.Join(roomID); err != nil {
t.Fatalf("bob join: %v", err)
}
var mu sync.Mutex
var got []string
sub, err := bob.Subscribe(roomID, func(_ frame.Frame, plaintext []byte) {
mu.Lock()
got = append(got, string(plaintext))
mu.Unlock()
})
if err != nil {
t.Fatalf("bob subscribe: %v", err)
}
defer sub.Unsubscribe()
time.Sleep(150 * time.Millisecond)
// Attacker: a raw NATS connection (the dev harness leaves the data plane open),
// no identity, forged Sender, NO signature.
const spoofMsg = "I am totally the victim"
rawAtk, err := nats.Connect(h.natsURL)
if err != nil {
t.Fatalf("attacker raw connect: %v", err)
}
defer rawAtk.Close()
spoof := frame.Frame{
Type: frame.PUB,
Subject: subject,
Sender: "victim-forged-endpoint",
MsgID: "spoof-1",
Epoch: 1,
Payload: []byte(spoofMsg),
// Sig intentionally nil — this is the attack.
}
sb, err := spoof.Marshal()
if err != nil {
t.Fatalf("marshal spoof: %v", err)
}
if err := rawAtk.Publish(subject, sb); err != nil {
t.Fatalf("attacker publish: %v", err)
}
_ = rawAtk.Flush()
// Golden: alice's properly signed frame must be delivered.
const goodMsg = "authentic from alice"
if err := alice.Publish(roomID, []byte(goodMsg)); err != nil {
t.Fatalf("alice publish: %v", err)
}
if !waitFor(&mu, &got, func(rs []string) bool {
for _, r := range rs {
if r == goodMsg {
return true
}
}
return false
}, 2*time.Second) {
t.Fatalf("a properly signed frame should be delivered; got %v", snapshot(&mu, &got))
}
// Error path: the unsigned, forged frame must NEVER reach the handler.
for _, r := range snapshot(&mu, &got) {
if r == spoofMsg {
t.Fatalf("SIG-NIL SPOOF: receiver accepted an unsigned frame with a forged Sender in a SignMsgs room")
}
}
// Edge: a room WITHOUT SignMsgs still delivers an unsigned raw frame, proving
// the drop is scoped to signed rooms and did not break the plain-NATS path.
const subjectOpen = "room.open.nosig"
openRoom, err := alice.CreateRoom(subjectOpen, room.ModeNATS)
if err != nil {
t.Fatalf("alice create open room: %v", err)
}
openCol := subscribeCollect(t, alice, openRoom)
defer openCol.sub.Unsubscribe()
time.Sleep(150 * time.Millisecond)
const openMsg = "unsigned but allowed here"
openFrame := frame.Frame{
Type: frame.PUB,
Subject: subjectOpen,
Sender: "anyone",
MsgID: "open-1",
Payload: []byte(openMsg),
// no Sig — fine in a non-signed room
}
ob, _ := openFrame.Marshal()
if err := rawAtk.Publish(subjectOpen, ob); err != nil {
t.Fatalf("publish open frame: %v", err)
}
_ = rawAtk.Flush()
if !waitFor(&openCol.mu, &openCol.msgs, func(rs []string) bool {
for _, r := range rs {
if r == openMsg {
return true
}
}
return false
}, 2*time.Second) {
t.Fatalf("an unsigned frame in a non-signed room should be delivered; got %v", snapshot(&openCol.mu, &openCol.msgs))
}
}
+99
View File
@@ -0,0 +1,99 @@
package client_test
import (
"encoding/hex"
"strings"
"testing"
"github.com/enmanuel/unibus/pkg/client"
"github.com/enmanuel/unibus/pkg/membership"
)
// findUserInfo returns the row with the given signing key (case-insensitive).
func findUserInfo(users []client.UserInfo, signPub string) (client.UserInfo, bool) {
want := strings.ToLower(signPub)
for _, u := range users {
if strings.ToLower(u.SignPub) == want {
return u, true
}
}
return client.UserInfo{}, false
}
// TestClientUsersAdminAPI drives the admin user-management API through the real
// pkg/client methods against an in-process membershipd under enforce: an admin
// client adds a user, lists it, revokes it, and sees the status flip — and a
// non-admin client is denied. This is the path the admin panel uses, so it locks
// the client/server contract the panel depends on.
func TestClientUsersAdminAPI(t *testing.T) {
h := newHarnessMode(t, membership.AuthEnforce)
waitHealth(t, h.ctrlURL)
admin, err := client.New(h.natsURL, h.ctrlURL, mustIdentity(t))
if err != nil {
t.Fatalf("connect admin: %v", err)
}
defer admin.Close()
registerClient(t, h, admin, "admin", membership.RoleAdmin)
member, err := client.New(h.natsURL, h.ctrlURL, mustIdentity(t))
if err != nil {
t.Fatalf("connect member: %v", err)
}
defer member.Close()
registerClient(t, h, member, "member", membership.RoleMember)
// A brand-new identity the admin will register over HTTP.
carol := mustIdentity(t)
carolPub := hex.EncodeToString(carol.SignPub)
// Admin adds carol as a member.
if err := admin.AddUser(carolPub, "carol", membership.RoleMember); err != nil {
t.Fatalf("admin AddUser: %v", err)
}
// Admin lists: carol present and active.
users, err := admin.ListUsers()
if err != nil {
t.Fatalf("admin ListUsers: %v", err)
}
row, ok := findUserInfo(users, carolPub)
if !ok {
t.Fatalf("carol missing from list after add: %+v", users)
}
if row.Status != membership.StatusActive || row.Role != membership.RoleMember {
t.Fatalf("carol row wrong after add: %+v", row)
}
// Re-adding the same key is a conflict surfaced as an error (no silent upsert).
if err := admin.AddUser(carolPub, "carol-again", membership.RoleAdmin); err == nil {
t.Fatalf("re-adding carol should error (409), got nil")
}
// Admin revokes carol; list shows the status flip (no hard delete).
if err := admin.RevokeUser(carolPub); err != nil {
t.Fatalf("admin RevokeUser: %v", err)
}
users, err = admin.ListUsers()
if err != nil {
t.Fatalf("admin ListUsers after revoke: %v", err)
}
row, ok = findUserInfo(users, carolPub)
if !ok {
t.Fatalf("carol vanished after revoke (should be a status flip): %+v", users)
}
if row.Status != membership.StatusRevoked {
t.Fatalf("carol should be revoked, got status %q", row.Status)
}
// A non-admin (member) is denied on every user-management method.
if _, err := member.ListUsers(); err == nil {
t.Fatalf("non-admin ListUsers should error (403), got nil")
}
if err := member.AddUser(carolPub, "x", membership.RoleMember); err == nil {
t.Fatalf("non-admin AddUser should error (403), got nil")
}
if err := member.RevokeUser(carolPub); err == nil {
t.Fatalf("non-admin RevokeUser should error (403), got nil")
}
}
+344
View File
@@ -0,0 +1,344 @@
package embeddednats_test
import (
"crypto/ecdsa"
"crypto/elliptic"
"crypto/rand"
"crypto/x509"
"crypto/x509/pkix"
"encoding/pem"
"fmt"
"math/big"
"net"
"os"
"path/filepath"
"testing"
"time"
"github.com/enmanuel/unibus/pkg/busauth"
"github.com/enmanuel/unibus/pkg/embeddednats"
"github.com/nats-io/nats.go"
server "github.com/nats-io/nats-server/v2/server"
)
// freePort returns an OS-assigned free TCP port on loopback.
func freePort(t *testing.T) int {
t.Helper()
l, err := net.Listen("tcp", "127.0.0.1:0")
if err != nil {
t.Fatalf("free port: %v", err)
}
defer l.Close()
return l.Addr().(*net.TCPAddr).Port
}
// startNode boots a clustered embedded NATS node. peerRoutePorts are the route
// ports of the OTHER nodes; user/pass gate the route layer (empty disables it);
// routeTLS, when non-nil, secures the routes with mutual TLS.
func startNode(t *testing.T, name string, clientPort, routePort int, peerRoutePorts []int, user, pass string, routeTLS *clusterTLS) *server.Server {
t.Helper()
routes := make([]string, 0, len(peerRoutePorts))
for _, p := range peerRoutePorts {
// Carry the cluster credentials in the route URL so this node
// authenticates outbound to its peers' route listeners.
if user != "" {
routes = append(routes, fmt.Sprintf("nats://%s:%s@127.0.0.1:%d", user, pass, p))
} else {
routes = append(routes, fmt.Sprintf("nats://127.0.0.1:%d", p))
}
}
cc := &embeddednats.ClusterConfig{
Name: "unibus-test",
Host: "127.0.0.1",
Port: routePort,
Routes: routes,
Username: user,
Password: pass,
}
if routeTLS != nil {
cfg, err := busauth.RouteTLSConfig(routeTLS.cert, routeTLS.key, routeTLS.ca)
if err != nil {
t.Fatalf("route TLS for %s: %v", name, err)
}
cc.TLS = cfg
}
ns, err := embeddednats.StartServer(embeddednats.ServerConfig{
StoreDir: t.TempDir(),
Host: "127.0.0.1",
Port: clientPort,
ServerName: name,
Cluster: cc,
})
if err != nil {
t.Fatalf("start node %s: %v", name, err)
}
t.Cleanup(func() { ns.Shutdown(); ns.WaitForShutdown() })
return ns
}
// waitRoutes waits until ns has at least want established routes, or fails.
func waitRoutes(t *testing.T, ns *server.Server, want int) {
t.Helper()
deadline := time.Now().Add(8 * time.Second)
for time.Now().Before(deadline) {
if ns.NumRoutes() >= want {
return
}
time.Sleep(50 * time.Millisecond)
}
t.Fatalf("node %q never reached %d routes (have %d)", ns.Name(), want, ns.NumRoutes())
}
// stableRouteCount waits for ns's route count to stop changing (the NATS route
// pool opens several connections per peer asynchronously) and returns it, so a
// test can use it as a baseline that an impostor must not increase.
func stableRouteCount(t *testing.T, ns *server.Server) int {
t.Helper()
prev := -1
stableSince := time.Now()
deadline := time.Now().Add(5 * time.Second)
for time.Now().Before(deadline) {
n := ns.NumRoutes()
if n != prev {
prev = n
stableSince = time.Now()
} else if time.Since(stableSince) >= 750*time.Millisecond {
return n
}
time.Sleep(50 * time.Millisecond)
}
return prev
}
// pubSubAcrossNodes connects a subscriber to subURL and a publisher to pubURL,
// publishes one message on subject, and reports whether it arrived within 3s.
// This proves the cluster forwards client subjects between nodes.
func pubSubAcrossNodes(t *testing.T, subURL, pubURL, subject, payload string) bool {
t.Helper()
subConn, err := nats.Connect(subURL)
if err != nil {
t.Fatalf("subscriber connect %s: %v", subURL, err)
}
defer subConn.Close()
got := make(chan string, 1)
if _, err := subConn.Subscribe(subject, func(m *nats.Msg) {
select {
case got <- string(m.Data):
default:
}
}); err != nil {
t.Fatalf("subscribe: %v", err)
}
if err := subConn.Flush(); err != nil {
t.Fatalf("flush sub: %v", err)
}
pubConn, err := nats.Connect(pubURL)
if err != nil {
t.Fatalf("publisher connect %s: %v", pubURL, err)
}
defer pubConn.Close()
// Retry the publish for a moment: route interest propagation across the
// cluster is asynchronous, so the very first publish can race the gossip.
deadline := time.Now().Add(3 * time.Second)
for time.Now().Before(deadline) {
if err := pubConn.Publish(subject, []byte(payload)); err != nil {
t.Fatalf("publish: %v", err)
}
_ = pubConn.Flush()
select {
case v := <-got:
return v == payload
case <-time.After(100 * time.Millisecond):
}
}
return false
}
// --- golden: two-node cluster forwards client subjects across nodes ----------
func TestClusterForwardsAcrossNodes(t *testing.T) {
rp0, rp1 := freePort(t), freePort(t)
n0 := startNode(t, "n0", freePort(t), rp0, []int{rp1}, "clusteruser", "clusterpass", nil)
n1 := startNode(t, "n1", freePort(t), rp1, []int{rp0}, "clusteruser", "clusterpass", nil)
waitRoutes(t, n0, 1)
waitRoutes(t, n1, 1)
if !pubSubAcrossNodes(t, n0.ClientURL(), n1.ClientURL(), "test.cross", "hello-cluster") {
t.Fatalf("subject published on n1 did not reach subscriber on n0")
}
}
// --- edge: three-node cluster (HA shape) forwards between non-adjacent nodes --
func TestClusterThreeNodesForward(t *testing.T) {
rp0, rp1, rp2 := freePort(t), freePort(t), freePort(t)
n0 := startNode(t, "n0", freePort(t), rp0, []int{rp1, rp2}, "u", "p", nil)
n1 := startNode(t, "n1", freePort(t), rp1, []int{rp0, rp2}, "u", "p", nil)
n2 := startNode(t, "n2", freePort(t), rp2, []int{rp0, rp1}, "u", "p", nil)
waitRoutes(t, n0, 2)
waitRoutes(t, n1, 2)
waitRoutes(t, n2, 2)
// Publish on n2, subscribe on n0: a message must traverse the cluster.
if !pubSubAcrossNodes(t, n0.ClientURL(), n2.ClientURL(), "test.ha", "three-node") {
t.Fatalf("subject published on n2 did not reach subscriber on n0")
}
}
// --- error: a node with the wrong cluster password is rejected as a route -----
func TestClusterRejectsBadRouteAuth(t *testing.T) {
rp0, rp1 := freePort(t), freePort(t)
good := startNode(t, "good", freePort(t), rp0, []int{rp1}, "secret", "right", nil)
_ = startNode(t, "peer", freePort(t), rp1, []int{rp0}, "secret", "right", nil)
waitRoutes(t, good, 1)
// Let the route pool settle so the baseline count is stable (NATS opens a
// pool of route connections per peer, so NumRoutes counts connections, not
// distinct peers).
base := stableRouteCount(t, good)
// Impostor knows the addresses but not the cluster password. It tries to
// route to `good`; the route handshake must be rejected, so the impostor
// never establishes a route.
impostor := startNode(t, "impostor", freePort(t), freePort(t), []int{rp0}, "secret", "WRONG", nil)
// Give the route layer ample time to (fail to) connect, then assert it never
// formed: the impostor has zero routes, and `good`'s route count is unchanged
// (it did not accept a route from the impostor).
time.Sleep(2 * time.Second)
if n := impostor.NumRoutes(); n != 0 {
t.Fatalf("impostor with wrong cluster password formed %d routes, want 0", n)
}
if n := good.NumRoutes(); n != base {
t.Fatalf("legit node route count changed from %d to %d after impostor attempt (it accepted the impostor)", base, n)
}
}
// --- golden (TLS): mutual-TLS routes forward across nodes ---------------------
func TestClusterMutualTLSForwards(t *testing.T) {
ca, caKey := genCA(t)
dir := t.TempDir()
tlsA := writeNodeCert(t, dir, "a", ca, caKey)
tlsB := writeNodeCert(t, dir, "b", ca, caKey)
rp0, rp1 := freePort(t), freePort(t)
n0 := startNode(t, "n0", freePort(t), rp0, []int{rp1}, "u", "p", tlsA)
n1 := startNode(t, "n1", freePort(t), rp1, []int{rp0}, "u", "p", tlsB)
waitRoutes(t, n0, 1)
waitRoutes(t, n1, 1)
if !pubSubAcrossNodes(t, n0.ClientURL(), n1.ClientURL(), "test.tls", "mtls-ok") {
t.Fatalf("subject did not cross the mutual-TLS cluster")
}
}
// --- error (TLS): a node whose cert is not signed by the bus CA cannot join ---
func TestClusterRejectsUnsignedNode(t *testing.T) {
ca, caKey := genCA(t)
dir := t.TempDir()
tlsGood := writeNodeCert(t, dir, "good", ca, caKey)
tlsPeer := writeNodeCert(t, dir, "peer", ca, caKey)
// The impostor signs its node cert with a DIFFERENT CA, and pins only that
// CA. The legit nodes' RequireAndVerifyClientCert against the bus CA rejects
// it; the impostor likewise rejects the legit node's cert. No route forms.
otherCA, otherKey := genCA(t)
tlsImpostor := writeNodeCert(t, dir, "impostor", otherCA, otherKey)
rp0, rp1 := freePort(t), freePort(t)
good := startNode(t, "good", freePort(t), rp0, []int{rp1}, "u", "p", tlsGood)
_ = startNode(t, "peer", freePort(t), rp1, []int{rp0}, "u", "p", tlsPeer)
waitRoutes(t, good, 1)
base := stableRouteCount(t, good)
impostor := startNode(t, "impostor", freePort(t), freePort(t), []int{rp0}, "u", "p", tlsImpostor)
time.Sleep(2 * time.Second)
if n := impostor.NumRoutes(); n != 0 {
t.Fatalf("impostor with unsigned cert formed %d routes, want 0", n)
}
if n := good.NumRoutes(); n != base {
t.Fatalf("legit node route count changed from %d to %d after unsigned impostor attempt (it accepted the impostor)", base, n)
}
}
// --- cert helpers ------------------------------------------------------------
type clusterTLS struct{ cert, key, ca string } // PEM file paths
// genCA creates a self-signed ECDSA CA certificate and its key.
func genCA(t *testing.T) (*x509.Certificate, *ecdsa.PrivateKey) {
t.Helper()
key, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
if err != nil {
t.Fatalf("gen CA key: %v", err)
}
tmpl := &x509.Certificate{
SerialNumber: big.NewInt(1),
Subject: pkix.Name{CommonName: "unibus-test-CA"},
NotBefore: time.Now().Add(-time.Hour),
NotAfter: time.Now().Add(24 * time.Hour),
KeyUsage: x509.KeyUsageCertSign | x509.KeyUsageDigitalSignature,
BasicConstraintsValid: true,
IsCA: true,
}
der, err := x509.CreateCertificate(rand.Reader, tmpl, tmpl, &key.PublicKey, key)
if err != nil {
t.Fatalf("create CA cert: %v", err)
}
caCert, err := x509.ParseCertificate(der)
if err != nil {
t.Fatalf("parse CA cert: %v", err)
}
return caCert, key
}
// writeNodeCert issues a node certificate signed by ca (SAN 127.0.0.1/::1,
// usable as both server and client) and writes cert/key/ca PEM files, returning
// their paths for RouteTLSConfig.
func writeNodeCert(t *testing.T, dir, name string, ca *x509.Certificate, caKey *ecdsa.PrivateKey) *clusterTLS {
t.Helper()
key, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
if err != nil {
t.Fatalf("gen node key: %v", err)
}
tmpl := &x509.Certificate{
SerialNumber: big.NewInt(time.Now().UnixNano()),
Subject: pkix.Name{CommonName: name},
NotBefore: time.Now().Add(-time.Hour),
NotAfter: time.Now().Add(24 * time.Hour),
KeyUsage: x509.KeyUsageDigitalSignature | x509.KeyUsageKeyEncipherment,
ExtKeyUsage: []x509.ExtKeyUsage{x509.ExtKeyUsageServerAuth, x509.ExtKeyUsageClientAuth},
IPAddresses: []net.IP{net.ParseIP("127.0.0.1"), net.ParseIP("::1")},
DNSNames: []string{"localhost"},
}
der, err := x509.CreateCertificate(rand.Reader, tmpl, ca, &key.PublicKey, caKey)
if err != nil {
t.Fatalf("create node cert: %v", err)
}
certPath := filepath.Join(dir, name+".crt")
keyPath := filepath.Join(dir, name+".key")
caPath := filepath.Join(dir, name+"-ca.crt")
writePEM(t, certPath, "CERTIFICATE", der)
keyDER, err := x509.MarshalECPrivateKey(key)
if err != nil {
t.Fatalf("marshal node key: %v", err)
}
writePEM(t, keyPath, "EC PRIVATE KEY", keyDER)
writePEM(t, caPath, "CERTIFICATE", ca.Raw)
return &clusterTLS{cert: certPath, key: keyPath, ca: caPath}
}
func writePEM(t *testing.T, path, blockType string, der []byte) {
t.Helper()
b := pem.EncodeToMemory(&pem.Block{Type: blockType, Bytes: der})
if err := os.WriteFile(path, b, 0o600); err != nil {
t.Fatalf("write %s: %v", path, err)
}
}
+126 -4
View File
@@ -8,25 +8,77 @@ package embeddednats
import ( import (
"crypto/tls" "crypto/tls"
"fmt" "fmt"
"net/url"
"os"
"time" "time"
server "github.com/nats-io/nats-server/v2/server" server "github.com/nats-io/nats-server/v2/server"
) )
// ClusterConfig configures the route layer that links several embedded NATS
// servers into a single cluster (issue 0003a). It is the data-plane side of
// high availability: with a cluster, a client subject published on one node is
// forwarded to subscribers connected to any other node, and (with JetStream
// replicas > 1) streams/KV are RAFT-replicated across nodes so the loss of one
// node does not lose the bus.
//
// The route layer is a SEPARATE trust boundary from the client data plane: it
// carries server-to-server traffic, so it authenticates NODES, not bus users.
// Never reuse the nkey client authenticator here. Routes are secured with their
// own shared secret (Username/Password -> NATS Cluster.Authorization) and their
// own mutual TLS (TLS, built from the bus CA with busauth.RouteTLSConfig): a
// node without the cluster secret and a CA-signed node certificate cannot join
// the cluster nor inject messages into it.
type ClusterConfig struct {
// Name is the cluster name; it MUST be identical on every node or the
// servers refuse to gossip routes to each other.
Name string
// Host and Port are the route listener (server-to-server), distinct from the
// client Host/Port. Use a free, non-client port (e.g. 6250).
Host string
Port int
// Routes are the nats-route URLs of the OTHER nodes, e.g.
// "nats://user:pass@10.0.0.2:6250". When the route layer is password
// protected each URL must carry the same userinfo as the local Username /
// Password so this node authenticates outbound to its peers.
Routes []string
// Username and Password gate the route listener (NATS Cluster.Authorization).
// A peer (or impostor) that connects to this node's route port without these
// credentials is rejected, so it never becomes a route. Empty disables route
// auth (dev / trusted-network only).
Username string
Password string
// TLS, when non-nil, secures the route connections with mutual TLS. Build it
// with busauth.RouteTLSConfig(cert, key, ca): the server presents its node
// certificate AND requires+verifies the connecting node's certificate against
// the bus CA, so an unsigned impostor cannot establish a route even with the
// right password. Nil keeps routes plaintext (dev / WireGuard-only).
TLS *tls.Config
}
// ServerConfig is the full set of knobs for the embedded NATS server. The zero // ServerConfig is the full set of knobs for the embedded NATS server. The zero
// value (empty StoreDir aside) yields a dev-friendly server: JetStream on, bound // value (empty StoreDir aside) yields a dev-friendly server: JetStream on, bound
// to all interfaces, no client auth, no TLS. Secured deployments set Auth and // to all interfaces, no client auth, no TLS, standalone (no cluster). Secured
// TLS; tests set Host to loopback and a free Port. // deployments set Auth and TLS; HA deployments set ServerName + Cluster; tests
// set Host to loopback and a free Port.
type ServerConfig struct { type ServerConfig struct {
StoreDir string // JetStream store directory StoreDir string // JetStream store directory
Host string // bind interface; "" = nats-server default ("0.0.0.0") Host string // bind interface; "" = nats-server default ("0.0.0.0")
Port int // listen port Port int // listen port
// ServerName is this node's unique name within the cluster. JetStream's RAFT
// layer requires a stable, unique name per node to form its meta-group; leave
// it empty for a standalone server (nats-server then auto-generates one).
ServerName string
// Auth, when non-nil, is installed as CustomClientAuthentication so the data // Auth, when non-nil, is installed as CustomClientAuthentication so the data
// plane only accepts approved clients (nkey signature + bus allowlist). // plane only accepts approved clients (nkey signature + bus allowlist).
Auth server.Authentication Auth server.Authentication
// TLS, when non-nil, makes the server present a certificate and require TLS // TLS, when non-nil, makes the server present a certificate and require TLS
// on the data plane. Clients must trust the issuing CA (see busauth). // on the data plane. Clients must trust the issuing CA (see busauth).
TLS *tls.Config TLS *tls.Config
// Cluster, when non-nil, joins this server to a route cluster for high
// availability (issue 0003a). Nil keeps the server standalone (the legacy
// single-node behavior).
Cluster *ClusterConfig
} }
// Start is a thin backward-compatible wrapper: embedded JetStream server on the // Start is a thin backward-compatible wrapper: embedded JetStream server on the
@@ -55,15 +107,32 @@ func StartHostAuth(storeDir, host string, port int, auth server.Authentication)
// blocks until the server is ready to accept connections (up to 5s) and returns // blocks until the server is ready to accept connections (up to 5s) and returns
// the running server; the caller must Shutdown it. // the running server; the caller must Shutdown it.
func StartServer(cfg ServerConfig) (*server.Server, error) { func StartServer(cfg ServerConfig) (*server.Server, error) {
// Diagnostic toggle: UNIBUS_NATS_DEBUG=1 enables the embedded nats-server's own
// logger (route/RAFT/JetStream errors), which is otherwise silenced. Off by
// default so production behavior is unchanged; only set it when debugging the
// cluster route layer.
debugLevel := os.Getenv("UNIBUS_NATS_DEBUG")
debugNATS := debugLevel == "1" || debugLevel == "2"
traceNATS := debugLevel == "2"
opts := &server.Options{ opts := &server.Options{
JetStream: true, JetStream: true,
StoreDir: cfg.StoreDir, StoreDir: cfg.StoreDir,
Host: cfg.Host, Host: cfg.Host,
Port: cfg.Port, Port: cfg.Port,
ServerName: cfg.ServerName,
DontListen: false, DontListen: false,
// Keep the embedded server quiet by default; the host app logs the URLs. // Keep the embedded server quiet by default; the host app logs the URLs.
NoLog: true, NoLog: !debugNATS,
NoSigs: true, Debug: debugNATS,
Trace: traceNATS,
Logtime: true,
NoSigs: true,
}
if debugNATS {
// Expose the nats-server monitoring endpoint (loopback) so the operator can
// inspect /jsz, /routez, /varz while debugging the cluster meta-group.
opts.HTTPHost = "127.0.0.1"
opts.HTTPPort = 8222
} }
if cfg.Auth != nil { if cfg.Auth != nil {
opts.CustomClientAuthentication = cfg.Auth opts.CustomClientAuthentication = cfg.Auth
@@ -78,11 +147,21 @@ func StartServer(cfg ServerConfig) (*server.Server, error) {
opts.TLS = true opts.TLS = true
} }
if cfg.Cluster != nil {
if err := applyClusterOpts(opts, cfg.Cluster); err != nil {
return nil, err
}
}
ns, err := server.NewServer(opts) ns, err := server.NewServer(opts)
if err != nil { if err != nil {
return nil, fmt.Errorf("embeddednats: new server: %w", err) return nil, fmt.Errorf("embeddednats: new server: %w", err)
} }
if debugNATS {
ns.ConfigureLogger()
}
go ns.Start() go ns.Start()
if !ns.ReadyForConnections(5 * time.Second) { if !ns.ReadyForConnections(5 * time.Second) {
@@ -93,6 +172,49 @@ func StartServer(cfg ServerConfig) (*server.Server, error) {
return ns, nil return ns, nil
} }
// applyClusterOpts translates a ClusterConfig into the nats-server route options
// on opts: the cluster listener (name + host/port + shared-secret auth + mutual
// TLS) and the outbound routes to the other nodes. A malformed route URL is a
// configuration error and aborts startup rather than silently dropping a peer.
func applyClusterOpts(opts *server.Options, c *ClusterConfig) error {
opts.Cluster = server.ClusterOpts{
Name: c.Name,
Host: c.Host,
Port: c.Port,
Username: c.Username,
Password: c.Password,
// Disable route connection pooling (nats-server 2.10+ defaults to a pool of
// 3 connections per peer). On a small cluster the pool churns with
// "duplicate route"/"client closed" reconnects that interrupt the meta-group
// RAFT heartbeats, causing perpetual leader re-elections so the JetStream
// meta never becomes current and stream/KV creation hangs (issue 0006g).
// PoolSize=-1 forces the classic single route per peer, which is stable for
// the 3-node unibus cluster.
PoolSize: -1,
// NoAdvertise stops the server from gossiping its locally-discovered IPs to
// peers. The cluster nodes are Docker hosts, so without this NATS advertises
// the docker bridge addresses (172.x / 10.0.x) as reachable routes; peers
// then try to dial those private, mutually-unreachable IPs, churning the
// route layer and destabilizing the JetStream meta-group. With NoAdvertise
// the nodes use ONLY the explicit public-IP routes we configure (issue 0006g).
NoAdvertise: true,
}
if c.TLS != nil {
opts.Cluster.TLSConfig = c.TLS
// A generous handshake budget: route TLS does a mutual handshake and the
// peer may still be booting. The default 2s can flap on a cold cluster.
opts.Cluster.TLSTimeout = 5.0
}
for _, r := range c.Routes {
u, err := url.Parse(r)
if err != nil {
return fmt.Errorf("embeddednats: parse route %q: %w", r, err)
}
opts.Routes = append(opts.Routes, u)
}
return nil
}
// ClientURL returns a NATS connection URL for the running embedded server. // ClientURL returns a NATS connection URL for the running embedded server.
func ClientURL(ns *server.Server) string { func ClientURL(ns *server.Server) string {
return ns.ClientURL() return ns.ClientURL()
+118
View File
@@ -0,0 +1,118 @@
package membership
// Per-subject data-plane access control derived from room membership (issue
// 0003e, audit H4 residual; tightened in issue 0006b for audit 0008 N2). The
// control plane already authorizes metadata by membership; this is the matching
// restriction on the NATS data plane so a registered peer can only
// publish/subscribe on the subjects of the rooms it actually belongs to — and can
// only reach the JetStream API of ITS OWN rooms' streams, never the control-plane
// KV buckets.
import (
"encoding/hex"
"fmt"
"strings"
"github.com/enmanuel/unibus/pkg/frame"
)
// clientInfraSubjects are the subjects every authorized peer needs regardless of
// room membership, kept deliberately MINIMAL (issue 0006b, audit 0008 N2):
//
// - "_INBOX.>" — request/reply plus the JetStream pull-consumer delivery
// and publish-ack inboxes.
// - "$JS.API.INFO" — account-level JetStream info (limits/usage counters). It
// exposes NO room/user/key contents, so granting it leaks nothing.
//
// It NO LONGER contains "$JS.API.>". That broad grant was the N2 leak: it let any
// registered peer drive the whole JetStream API and read the control-plane KV
// buckets (KV_UNIBUS_users/rooms/members/room_keys) and the object store directly
// over NATS, bypassing the HTTP authorization (requireMember and the own-endpoint
// checks). JetStream API access is now granted PER ROOM, scoped to the stream of
// each room the peer belongs to (jsSubjectsFor). Because the control-plane KV
// streams (KV_UNIBUS_*) and the object store (OBJ_UNIBUS_*) are never a room
// stream, they fall outside the closed allow set and are denied by default.
var clientInfraSubjects = []string{"_INBOX.>", "$JS.API.INFO"}
// roomStreamName is the JetStream stream name a persisted room maps to. It MUST
// stay identical to pkg/client.streamName ("UNIBUS_" + sanitized roomID) so the
// per-room ACL grants exactly the subjects the client's JetStream calls use. Room
// ids are ULIDs (no '.'), so the sanitizing is a no-op in practice, but the rule
// is replicated defensively so the producer (client) and the authorizer (this
// ACL) never drift apart.
func roomStreamName(roomID string) string {
var b strings.Builder
b.Grow(len("UNIBUS_") + len(roomID))
b.WriteString("UNIBUS_")
for _, r := range roomID {
switch {
case r >= 'a' && r <= 'z', r >= 'A' && r <= 'Z', r >= '0' && r <= '9', r == '_':
b.WriteRune(r)
default:
b.WriteRune('_')
}
}
return b.String()
}
// jsSubjectsFor returns the MINIMAL JetStream API subjects a peer needs to use the
// durable stream of ONE persisted room: create/update/info the stream, manage and
// pull from its durable consumer, and ack deliveries. Every subject embeds this
// room's stream name, so the grant cannot reach another room's stream nor any
// control-plane stream (KV_UNIBUS_* / OBJ_UNIBUS_*). The wildcard layout matches
// the NATS JetStream API subject grammar (the stream name is the trailing token
// of single-verb requests and follows a two-token verb for MSG.GET / MSG.NEXT /
// DURABLE.CREATE):
//
// $JS.API.STREAM.<verb>.<stream> verb in {CREATE,UPDATE,INFO,DELETE,PURGE,...}
// $JS.API.STREAM.MSG.<op>.<stream> op in {GET,DELETE}
// $JS.API.CONSUMER.<verb>.<stream> verb in {LIST,NAMES,CREATE(ephemeral)}
// $JS.API.CONSUMER.<verb>.<stream>.<consumer>... verb in {CREATE,INFO,DELETE}
// $JS.API.CONSUMER.<v1>.<v2>.<stream>.<cons> {MSG.NEXT, DURABLE.CREATE}
// $JS.ACK.<stream>.> message acknowledgements
func jsSubjectsFor(roomID string) []string {
s := roomStreamName(roomID)
return []string{
"$JS.API.STREAM.*." + s,
"$JS.API.STREAM.*.*." + s,
"$JS.API.CONSUMER.*." + s,
"$JS.API.CONSUMER.*." + s + ".>",
"$JS.API.CONSUMER.*.*." + s + ".>",
"$JS.ACK." + s + ".>",
}
}
// SubjectACLFor returns a function that maps a signing public key (lowercase hex)
// to the data-plane subjects that identity may publish and subscribe to: the
// subject of every room it belongs to, the per-room JetStream API subjects of
// those rooms (so persisted-room history keeps working), plus the minimal client
// infrastructure subjects. It reads the live membership store, so the permissions
// reflect the identity's rooms at the moment it connects. A decode error or a
// store failure is returned as an error so the caller can fail closed (deny the
// connection) rather than grant open access.
//
// Because NATS freezes permissions at connect time, a peer invited to a new room
// after connecting must reconnect (client.RefreshSession) to pick up the new
// room's subject. The bus is the authoritative directory of subjects, so an
// unlisted subject is simply absent from the allow set.
func SubjectACLFor(store Store) func(signPubHex string) ([]string, error) {
return func(signPubHex string) ([]string, error) {
pub, err := hex.DecodeString(signPubHex)
if err != nil || len(pub) != 32 {
return nil, fmt.Errorf("acl: malformed sign pub %q", signPubHex)
}
endpoint := frame.EndpointID(pub)
rooms, err := store.ListRoomsForEndpoint(endpoint)
if err != nil {
return nil, fmt.Errorf("acl: list rooms for %s: %w", endpoint, err)
}
// clientInfra + per room: the room subject + that room's JetStream API.
subjects := make([]string, 0, len(clientInfraSubjects)+len(rooms)*7)
subjects = append(subjects, clientInfraSubjects...)
for _, r := range rooms {
subjects = append(subjects, r.Subject)
subjects = append(subjects, jsSubjectsFor(r.RoomID)...)
}
return subjects, nil
}
}
+379
View File
@@ -0,0 +1,379 @@
package membership_test
import (
"encoding/hex"
"net"
"net/http/httptest"
"path/filepath"
"testing"
"time"
cs "fn-registry/functions/cybersecurity"
"github.com/enmanuel/unibus/pkg/blobstore"
"github.com/enmanuel/unibus/pkg/busauth"
"github.com/enmanuel/unibus/pkg/client"
"github.com/enmanuel/unibus/pkg/embeddednats"
"github.com/enmanuel/unibus/pkg/frame"
"github.com/enmanuel/unibus/pkg/membership"
"github.com/nats-io/nats.go"
server "github.com/nats-io/nats-server/v2/server"
)
func aclFreePort(t *testing.T) int {
t.Helper()
l, err := net.Listen("tcp", "127.0.0.1:0")
if err != nil {
t.Fatalf("free port: %v", err)
}
defer l.Close()
return l.Addr().(*net.TCPAddr).Port
}
func mustID(t *testing.T) cs.Identity {
t.Helper()
id, err := cs.GenerateIdentity()
if err != nil {
t.Fatalf("identity: %v", err)
}
return id
}
// aclPermsFunc builds the per-subject PermissionsFunc the ACL authenticator
// expects. It delegates to the SAME production wiring membershipd uses
// (busauth.PermissionsFromSubjects over membership.SubjectACLFor), so this test
// exercises the real path rather than a test-only reimplementation.
func aclPermsFunc(store membership.Store) busauth.PermissionsFunc {
return busauth.PermissionsFromSubjects(membership.SubjectACLFor(store))
}
// startACLNats boots an embedded NATS whose authenticator confines each peer to
// the subjects of the rooms it belongs to (audit H4 residual).
func startACLNats(t *testing.T, store membership.Store) *server.Server {
t.Helper()
auth := busauth.NewNkeyAuthenticatorACL(store.IsAuthorized, aclPermsFunc(store))
ns, err := embeddednats.StartServer(embeddednats.ServerConfig{
StoreDir: t.TempDir(), Host: "127.0.0.1", Port: aclFreePort(t), Auth: auth,
})
if err != nil {
t.Fatalf("acl nats: %v", err)
}
t.Cleanup(func() { ns.Shutdown(); ns.WaitForShutdown() })
return ns
}
func nkeyConn(t *testing.T, natsURL string, id cs.Identity, errCh chan error) *nats.Conn {
t.Helper()
pub, sign, err := busauth.ClientNkey(id.SignPriv)
if err != nil {
t.Fatalf("nkey: %v", err)
}
nc, err := nats.Connect(natsURL,
nats.Nkey(pub, sign),
nats.ErrorHandler(func(_ *nats.Conn, _ *nats.Subscription, e error) {
select {
case errCh <- e:
default:
}
}),
)
if err != nil {
t.Fatalf("connect nkey: %v", err)
}
t.Cleanup(nc.Close)
return nc
}
func mustAddUser(t *testing.T, store membership.Store, id cs.Identity, handle string) {
t.Helper()
if err := store.AddUser(hex.EncodeToString(id.SignPub), handle, membership.RoleMember); err != nil {
t.Fatalf("add user %s: %v", handle, err)
}
}
func mustCreateRoom(t *testing.T, store membership.Store, roomID, subject, ownerEP string, owner cs.Identity) {
t.Helper()
info := membership.RoomInfo{RoomID: roomID, Subject: subject, OwnerEndpoint: ownerEP}
if err := store.CreateRoom(info, owner.SignPub, owner.KexPub, nil); err != nil {
t.Fatalf("create room %s: %v", roomID, err)
}
}
func newCtrl(t *testing.T, store membership.Store, blobs blobstore.Store) string {
t.Helper()
ts := httptest.NewServer(membership.NewServer(store, blobs, membership.AuthOff))
t.Cleanup(ts.Close)
return ts.URL
}
func waitErr(ch chan error, d time.Duration) error {
select {
case e := <-ch:
return e
case <-time.After(d):
return nil
}
}
func drain(ch chan error) {
for {
select {
case <-ch:
default:
return
}
}
}
// TestSubjectACLIsolation closes the audit H4 residual: a registered peer is
// confined to the subjects of the rooms it belongs to. alice (member of room.A)
// may sub/pub room.A but is DENIED sub/pub on room.B, and never reads what bob
// (member of room.B) publishes there.
func TestSubjectACLIsolation(t *testing.T) {
dir := t.TempDir()
store, err := membership.Open(filepath.Join(dir, "unibus.db"))
if err != nil {
t.Fatalf("store: %v", err)
}
t.Cleanup(func() { store.Close() })
alice, bob := mustID(t), mustID(t)
aliceEP, bobEP := frame.EndpointID(alice.SignPub), frame.EndpointID(bob.SignPub)
mustAddUser(t, store, alice, "alice")
mustAddUser(t, store, bob, "bob")
const subjA, subjB = "room.acl.a", "room.acl.b"
mustCreateRoom(t, store, "ROOMA", subjA, aliceEP, alice)
mustCreateRoom(t, store, "ROOMB", subjB, bobEP, bob)
srv := startACLNats(t, store)
url := srv.ClientURL()
aliceErr := make(chan error, 4)
bobErr := make(chan error, 4)
aliceNC := nkeyConn(t, url, alice, aliceErr)
bobNC := nkeyConn(t, url, bob, bobErr)
// alice may subscribe to her own room (no error).
aliceGot := make(chan string, 4)
if _, err := aliceNC.Subscribe(subjA, func(m *nats.Msg) { aliceGot <- string(m.Data) }); err != nil {
t.Fatalf("alice sub A: %v", err)
}
_ = aliceNC.Flush()
if e := waitErr(aliceErr, 300*time.Millisecond); e != nil {
t.Fatalf("alice sub to her OWN room raised an error: %v", e)
}
// alice subscribing to bob's room is a permissions violation.
if _, err := aliceNC.Subscribe(subjB, func(m *nats.Msg) { aliceGot <- "LEAK:" + string(m.Data) }); err != nil {
t.Fatalf("alice sub B (queue): %v", err)
}
_ = aliceNC.Flush()
if e := waitErr(aliceErr, 1*time.Second); e == nil {
t.Fatalf("alice subscribing to bob's room should raise a permissions violation")
}
// bob publishes in his room; alice (denied) must not receive it.
bobGot := make(chan string, 4)
if _, err := bobNC.Subscribe(subjB, func(m *nats.Msg) { bobGot <- string(m.Data) }); err != nil {
t.Fatalf("bob sub B: %v", err)
}
_ = bobNC.Flush()
if err := bobNC.Publish(subjB, []byte("internal-bob")); err != nil {
t.Fatalf("bob pub B: %v", err)
}
_ = bobNC.Flush()
select {
case got := <-bobGot:
if got != "internal-bob" {
t.Fatalf("bob got %q", got)
}
case <-time.After(2 * time.Second):
t.Fatalf("bob did not receive his own message")
}
select {
case leak := <-aliceGot:
t.Fatalf("alice received bob's room traffic despite the ACL: %q", leak)
case <-time.After(500 * time.Millisecond):
// good: alice never got it
}
// alice publishing into bob's room is denied; bob must not receive it.
drain(aliceErr)
if err := aliceNC.Publish(subjB, []byte("intruder")); err != nil {
t.Fatalf("alice pub B (queue): %v", err)
}
_ = aliceNC.Flush()
if e := waitErr(aliceErr, 1*time.Second); e == nil {
t.Fatalf("alice publishing into bob's room should raise a permissions violation")
}
select {
case got := <-bobGot:
t.Fatalf("bob received alice's cross-room publish despite the ACL: %q", got)
case <-time.After(500 * time.Millisecond):
// good
}
}
// TestReaudit_H4_WildcardMetadataLeak ports the re-auditor's H4 vector. Before
// the per-subject ACL was WIRED into membershipd (it existed in pkg/membership and
// pkg/busauth but the binary used the plain NewNkeyAuthenticator), a registered
// NON-member could open a raw NATS connection, Subscribe(">"), and capture every
// room's subject plus JetStream stream/advisory activity — the payload stayed E2E
// ciphertext, but the metadata leaked. With NewNkeyAuthenticatorACL wired via the
// production path (busauth.PermissionsFromSubjects(membership.SubjectACLFor)), a
// non-member is confined to the client-infra subjects, so the wildcard and any
// foreign room subject are denied.
//
// Coverage:
// - error : a non-member's Subscribe(">") raises a permission violation;
// - edge : a non-member subscribing to another room's exact subject is denied;
// - golden: the member still pub/subs her own room, and the non-member never
// captures that traffic.
//
// Residual now CLOSED (issue 0006b, audit 0008 N2): the client-infra grant no
// longer includes "$JS.API.>". JetStream API access is granted per-room only
// (membership.jsSubjectsFor), so a peer can reach the API of its OWN rooms'
// streams but not the control-plane KV buckets (KV_UNIBUS_*) nor another room's
// stream. See TestAttack0008_N2 for the closed-leak regression.
func TestReaudit_H4_WildcardMetadataLeak(t *testing.T) {
dir := t.TempDir()
store, err := membership.Open(filepath.Join(dir, "unibus.db"))
if err != nil {
t.Fatalf("store: %v", err)
}
t.Cleanup(func() { store.Close() })
alice, eve := mustID(t), mustID(t)
aliceEP := frame.EndpointID(alice.SignPub)
mustAddUser(t, store, alice, "alice")
mustAddUser(t, store, eve, "eve") // eve is REGISTERED but never a member of alice's room
const subject = "room.e2e.confidential"
mustCreateRoom(t, store, "ROOMA", subject, aliceEP, alice)
srv := startACLNats(t, store)
url := srv.ClientURL()
eveErr := make(chan error, 8)
eveNC := nkeyConn(t, url, eve, eveErr)
eveAll := make(chan *nats.Msg, 16)
// Error: eve's wildcard subscription is rejected. nats.go creates the local sub
// object and the server rejects it asynchronously (delivered to ErrorHandler).
if _, err := eveNC.Subscribe(">", func(m *nats.Msg) { eveAll <- m }); err != nil {
t.Fatalf("eve sub >: %v", err)
}
_ = eveNC.Flush()
if e := waitErr(eveErr, 1*time.Second); e == nil {
t.Fatalf("a non-member's Subscribe(\">\") must raise a permissions violation (wildcard metadata leak still open)")
}
// Edge: eve subscribing to the foreign room's EXACT subject is also denied.
drain(eveErr)
if _, err := eveNC.Subscribe(subject, func(m *nats.Msg) { eveAll <- m }); err != nil {
t.Fatalf("eve sub subject: %v", err)
}
_ = eveNC.Flush()
if e := waitErr(eveErr, 1*time.Second); e == nil {
t.Fatalf("a non-member subscribing to another room's subject must be denied")
}
// Golden: alice (the member) pub/subs her own room with no violation, and eve
// never captured the traffic despite her (rejected) wildcard.
aliceErr := make(chan error, 4)
aliceNC := nkeyConn(t, url, alice, aliceErr)
aliceGot := make(chan string, 4)
if _, err := aliceNC.Subscribe(subject, func(m *nats.Msg) { aliceGot <- string(m.Data) }); err != nil {
t.Fatalf("alice sub own room: %v", err)
}
_ = aliceNC.Flush()
if e := waitErr(aliceErr, 300*time.Millisecond); e != nil {
t.Fatalf("alice subscribing to her OWN room raised an error: %v", e)
}
if err := aliceNC.Publish(subject, []byte("members-only metadata")); err != nil {
t.Fatalf("alice publish: %v", err)
}
_ = aliceNC.Flush()
select {
case got := <-aliceGot:
if got != "members-only metadata" {
t.Fatalf("alice got %q", got)
}
case <-time.After(2 * time.Second):
t.Fatalf("alice did not receive her own room's message")
}
select {
case m := <-eveAll:
t.Fatalf("eve captured room traffic despite the ACL: subject=%q data=%q", m.Subject, m.Data)
case <-time.After(500 * time.Millisecond):
// good: eve captured nothing
}
}
// TestRefreshSessionGainsNewRoom is the "permissions refreshed on join" path:
// alice is not in room B, so her connection has no permission for its subject;
// after she is added to room B and calls RefreshSession, the reconnect
// re-derives her permissions and she gains the room's subject.
func TestRefreshSessionGainsNewRoom(t *testing.T) {
dir := t.TempDir()
store, err := membership.Open(filepath.Join(dir, "unibus.db"))
if err != nil {
t.Fatalf("store: %v", err)
}
t.Cleanup(func() { store.Close() })
alice, bob := mustID(t), mustID(t)
aliceEP, bobEP := frame.EndpointID(alice.SignPub), frame.EndpointID(bob.SignPub)
mustAddUser(t, store, alice, "alice")
mustAddUser(t, store, bob, "bob")
const subjB = "room.refresh.b"
mustCreateRoom(t, store, "ROOMB", subjB, bobEP, bob)
srv := startACLNats(t, store)
blobs, _ := blobstore.New(filepath.Join(dir, "blobs"))
ctrl := newCtrl(t, store, blobs)
aliceC, err := client.NewWithOptions(srv.ClientURL(), ctrl, alice, client.Options{UseNkey: true})
if err != nil {
t.Fatalf("connect alice: %v", err)
}
defer aliceC.Close()
// Add alice to room B (as if invited), then RefreshSession so the
// authenticator re-derives her permissions on reconnect.
if _, err := store.GetMember("ROOMB", aliceEP); err == nil {
t.Fatalf("alice should not be a member yet")
}
if err := store.AddMember("ROOMB", membership.Member{Endpoint: aliceEP, Role: "member", SignPub: alice.SignPub, KexPub: alice.KexPub}, 1, nil); err != nil {
t.Fatalf("add alice to room B: %v", err)
}
if err := aliceC.RefreshSession(); err != nil {
t.Fatalf("refresh session: %v", err)
}
bobErr := make(chan error, 2)
bobNC := nkeyConn(t, srv.ClientURL(), bob, bobErr)
got := make(chan string, 2)
sub, err := aliceC.Subscribe("ROOMB", func(_ frame.Frame, plaintext []byte) { got <- string(plaintext) })
if err != nil {
t.Fatalf("alice subscribe room B after refresh: %v", err)
}
defer sub.Unsubscribe()
time.Sleep(200 * time.Millisecond)
// bob publishes a minimal cleartext frame on subjB.
f := frame.Frame{Type: frame.PUB, Subject: subjB, Sender: bobEP, MsgID: "m1", Payload: []byte("hello-after-join")}
b, _ := f.Marshal()
if err := bobNC.Publish(subjB, b); err != nil {
t.Fatalf("bob publish: %v", err)
}
_ = bobNC.Flush()
select {
case msg := <-got:
if msg != "hello-after-join" {
t.Fatalf("alice got %q", msg)
}
case <-time.After(3 * time.Second):
t.Fatalf("alice did not receive room B traffic after RefreshSession (permissions not refreshed)")
}
}
+19 -8
View File
@@ -95,16 +95,27 @@ func CanonicalRequest(method, path, ts, nonce string, body []byte) []byte {
return []byte(method + "\n" + path + "\n" + ts + "\n" + nonce + "\n" + hex.EncodeToString(sum[:])) return []byte(method + "\n" + path + "\n" + ts + "\n" + nonce + "\n" + hex.EncodeToString(sum[:]))
} }
// nonceCache remembers recently-seen nonces to reject replays. It is an // nonceStore is the anti-replay backend: rememberOrReject records a nonce and
// in-memory store guarded by a mutex — sufficient for a single membershipd // reports whether it was unseen (true -> accept) or already seen (false ->
// process (the spec's chosen tradeoff over a server-issued nonce round-trip). A // reject the replay). It is an interface (issue 0003e) so the single-node
// distributed deployment would need a shared store (tracked for issue 0003). // in-memory cache can be swapped for a replicated KV store: a per-process cache
// is BROKEN under multi-node failover (a request captured and replayed to a
// DIFFERENT node whose cache never saw the nonce would be accepted), so a
// cluster MUST share the nonce state. Every implementation fails CLOSED — a
// backend it cannot reach rejects rather than admits.
type nonceStore interface {
rememberOrReject(nonce string, now time.Time) bool
}
// memNonceCache remembers recently-seen nonces to reject replays. It is an
// in-memory store guarded by a mutex — sufficient for a SINGLE membershipd
// process. A clustered deployment uses kvNonceStore instead (issue 0003e).
// //
// Pruning is O(expired), not O(n): because the TTL is constant, insertion order // Pruning is O(expired), not O(n): because the TTL is constant, insertion order
// equals expiry order, so the oldest entries (front of `order`) are exactly the // equals expiry order, so the oldest entries (front of `order`) are exactly the
// ones that expire first (audit H7 — the previous full-map scan under the mutex // ones that expire first (audit H7 — the previous full-map scan under the mutex
// was a CPU-amplification vector). A size cap bounds memory. // was a CPU-amplification vector). A size cap bounds memory.
type nonceCache struct { type memNonceCache struct {
mu sync.Mutex mu sync.Mutex
seen map[string]time.Time // nonce -> expiry seen map[string]time.Time // nonce -> expiry
order []string // nonces in insertion order == expiry order order []string // nonces in insertion order == expiry order
@@ -112,13 +123,13 @@ type nonceCache struct {
cap int cap int
} }
func newNonceCache(ttl time.Duration, capacity int) *nonceCache { func newMemNonceCache(ttl time.Duration, capacity int) *memNonceCache {
return &nonceCache{seen: make(map[string]time.Time), ttl: ttl, cap: capacity} return &memNonceCache{seen: make(map[string]time.Time), ttl: ttl, cap: capacity}
} }
// rememberOrReject records nonce and returns true if it was unseen, or false if // rememberOrReject records nonce and returns true if it was unseen, or false if
// it is a replay (still live in the cache). // it is a replay (still live in the cache).
func (n *nonceCache) rememberOrReject(nonce string, now time.Time) bool { func (n *memNonceCache) rememberOrReject(nonce string, now time.Time) bool {
n.mu.Lock() n.mu.Lock()
defer n.mu.Unlock() defer n.mu.Unlock()
+1 -1
View File
@@ -22,7 +22,7 @@ import (
// with a fresh store + blob store, and seeds one active admin ("alice"). // with a fresh store + blob store, and seeds one active admin ("alice").
type authHarness struct { type authHarness struct {
ts *httptest.Server ts *httptest.Server
store *Store store Store
alice cs.Identity alice cs.Identity
alicePub string // hex alicePub string // hex
} }
+148
View File
@@ -0,0 +1,148 @@
package membership
import (
"net/http"
"net/http/httptest"
"os"
"runtime"
"strconv"
"strings"
"sync"
"sync/atomic"
"testing"
"time"
)
// readRSSkBRaw reads VmRSS (kB) from /proc without a *testing.T, so it is safe to
// call from a sampling goroutine (vmRSSkB calls t.Skip, which may only run on the
// test's own goroutine). Returns 0 when unavailable.
func readRSSkBRaw() int64 {
b, err := os.ReadFile("/proc/self/status")
if err != nil {
return 0
}
for _, line := range strings.Split(string(b), "\n") {
if strings.HasPrefix(line, "VmRSS:") {
f := strings.Fields(line)
if len(f) >= 2 {
v, _ := strconv.ParseInt(f[1], 10, 64)
return v
}
}
}
return 0
}
// TestReaudit_DoSConcurrency ports the re-auditor's N2 (Medio-Alto) finding: the
// per-request body ceiling and the per-IP rate limit do not bound the AGGREGATE
// memory of many concurrent uploads. The auditor drove RSS to ~1.42 GB with 40
// concurrent 16 MiB blob uploads. With the global in-flight byte limiter, the
// number of simultaneously-buffered uploads is capped, so the resident set stays
// bounded regardless of how many connections arrive at once.
//
// Coverage:
// - golden: a normal upload succeeds, and the server is still healthy after the
// storm (the limiter did not wedge it);
// - edge : concurrency right at the cap is admitted;
// - error : a concurrent flood far past the cap sheds the excess with 503
// (backpressure) instead of buffering it all, and the RSS spike stays bounded
// and does NOT scale with the number of requests.
func TestReaudit_DoSConcurrency(t *testing.T) {
if runtime.GOOS != "linux" {
t.Skip("RSS probe is Linux-only")
}
srv := dosServer(t, AuthOff)
// Force a small aggregate cap so the bound is observable in a unit test: with
// a 16 MiB blob ceiling, 48 MiB admits ~3 concurrent uploads. Production uses
// maxInflightBytes (128 MiB); the mechanism under test is identical.
const cap = int64(48) << 20
srv.inflight = newInflightLimiter(cap)
const blob = maxBlobBytes // 16 MiB, the per-request ceiling
const n = 40 // the auditor's figure
// A spike bound: with the cap admitting ~3 concurrent 16 MiB uploads and a
// ~2x copy factor (auth buffer + handler buffer) plus Go runtime slack, the
// delta should stay well under this. Without the limiter, 40 concurrent
// uploads admitted at once would add hundreds of MB (the auditor saw ~1.4 GB).
const maxSpikeKB = int64(256) << 10 // 256 MiB
runtime.GC()
before := readRSSkBRaw()
// Sample peak RSS while the storm runs.
var peak int64
atomic.StoreInt64(&peak, before)
stop := make(chan struct{})
var sampler sync.WaitGroup
sampler.Add(1)
go func() {
defer sampler.Done()
for {
select {
case <-stop:
return
default:
if v := readRSSkBRaw(); v > atomic.LoadInt64(&peak) {
atomic.StoreInt64(&peak, v)
}
time.Sleep(2 * time.Millisecond)
}
}
}()
var got503, got200 int64
var wg sync.WaitGroup
for i := 0; i < n; i++ {
wg.Add(1)
go func() {
defer wg.Done()
req := httptest.NewRequest(http.MethodPost, "/blobs", &zeroReader{remaining: blob})
req.ContentLength = blob
// Distinct source IP per request: this is the multi-IP (botnet) shape the
// auditor flagged, where the per-IP rate limit gives no aggregate defense.
// The in-flight byte limiter is the global bound that must hold here.
req.RemoteAddr = "198.51.100." + strconv.Itoa(i%254+1) + ":1234"
rec := httptest.NewRecorder()
srv.ServeHTTP(rec, req)
switch rec.Code {
case http.StatusServiceUnavailable:
atomic.AddInt64(&got503, 1)
case http.StatusOK:
atomic.AddInt64(&got200, 1)
}
}()
}
wg.Wait()
close(stop)
sampler.Wait()
runtime.GC()
delta := atomic.LoadInt64(&peak) - before
// Error path: the flood must have hit the cap and shed the excess with 503.
if got503 == 0 {
t.Fatalf("a concurrent flood of %d uploads past the cap should shed some with 503; got 200=%d 503=%d", n, got200, got503)
}
// The aggregate memory must stay bounded — not scale with n.
if delta > maxSpikeKB {
t.Fatalf("aggregate RSS spiked %d kB under %d concurrent uploads (bound %d kB): in-flight limiter not bounding memory", delta, n, maxSpikeKB)
}
// All reservations released after the storm.
if f := srv.inflight.inFlight(); f != 0 {
t.Fatalf("after the storm inFlight = %d, want 0 (reservations leaked)", f)
}
// Golden: the server is still healthy and serves a normal upload (from a fresh
// IP so the per-IP rate limiter, untouched here, is not what we measure).
rec := httptest.NewRecorder()
gReq := httptest.NewRequest(http.MethodPost, "/blobs", strings.NewReader("hello after storm"))
gReq.RemoteAddr = "203.0.113.9:9999"
srv.ServeHTTP(rec, gReq)
if rec.Code != http.StatusOK {
t.Fatalf("a normal upload after the storm should be 200, got %d (%s)", rec.Code, rec.Body.String())
}
t.Logf("N2 bound: %d uploads -> 200=%d 503=%d, RSS delta %d kB (bound %d kB), cap %d MiB",
n, got200, got503, delta, maxSpikeKB, cap>>20)
}
+85
View File
@@ -0,0 +1,85 @@
package membership
import "sync/atomic"
// inflightLimiter is a non-blocking, byte-counting concurrency limiter: a global
// cap on how many bytes of request body the server will buffer simultaneously.
//
// The per-request body ceilings (maxControlBodyBytes / maxBlobBytes) bound a
// single request, and the per-IP rate limiter throttles a single source, but
// neither bounds the AGGREGATE memory across many concurrent uploads: the
// re-audit (report 0006, N2) showed 40 concurrent 16 MiB blob uploads driving
// RSS to ~1.42 GB, and a distributed (multi-IP) flood scales without a ceiling
// because the rate limiter is per-IP. This limiter is the missing aggregate
// bound: ServeHTTP reserves a request's worst-case buffered size before reading
// the body and releases it when the request finishes, so the total bytes in
// flight can never exceed max regardless of how many connections or source IPs
// arrive at once.
//
// It is intentionally NON-blocking: when a reservation does not fit, the caller
// sheds the request with backpressure (503) rather than parking a goroutine,
// which would let an attacker exhaust goroutines/connections instead of RAM. The
// counter is maintained with sync/atomic (a CAS loop), so it is safe for
// concurrent use without a mutex.
//
// Implementation note: this lives inside unibus rather than the fn-registry
// (where a generic concurrency primitive would normally belong) because the
// registry's functions/core package pulls in transitive dependencies that
// require CGO (mattn/go-sqlite3) and external modules, which are incompatible
// with unibus's CGO_ENABLED=0 build, and because this work is scoped to the
// unibus sub-repo.
type inflightLimiter struct {
max int64 // immutable after construction; <= 0 disables the limiter
used int64 // bytes currently reserved; accessed ONLY via sync/atomic
}
// newInflightLimiter builds a limiter with a cap of maxBytes bytes in flight.
// maxBytes <= 0 disables the cap (tryAcquire always grants), which is the
// loopback/dev posture where an aggregate memory ceiling is not wanted.
func newInflightLimiter(maxBytes int64) *inflightLimiter {
return &inflightLimiter{max: maxBytes}
}
// tryAcquire reserves n bytes without blocking. It returns true and reserves the
// bytes when they fit within the cap (used+n <= max), or false (reserving
// nothing) when they do not. n <= 0 is granted without reserving, and a disabled
// limiter (max <= 0) always grants. Safe for concurrent use.
func (l *inflightLimiter) tryAcquire(n int64) bool {
if l.max <= 0 || n <= 0 {
return true
}
for {
cur := atomic.LoadInt64(&l.used)
if cur+n > l.max {
return false
}
if atomic.CompareAndSwapInt64(&l.used, cur, cur+n) {
return true
}
}
}
// release returns n previously reserved bytes. It must be paired with a
// tryAcquire that granted. A disabled limiter or n <= 0 is a no-op. The counter
// never drops below zero (a defensive clamp against an accidental double release).
func (l *inflightLimiter) release(n int64) {
if l.max <= 0 || n <= 0 {
return
}
for {
cur := atomic.LoadInt64(&l.used)
nv := cur - n
if nv < 0 {
nv = 0
}
if atomic.CompareAndSwapInt64(&l.used, cur, nv) {
return
}
}
}
// inFlight returns the bytes currently reserved. It is observability for tests
// and metrics.
func (l *inflightLimiter) inFlight() int64 {
return atomic.LoadInt64(&l.used)
}
+97
View File
@@ -0,0 +1,97 @@
package membership
import (
"sync"
"testing"
)
// TestInflightLimiterBasics covers the limiter contract: granting within the cap
// (golden), the exact boundary (edge), refusal over the cap without mutating the
// counter (error), the disabled mode, and the defensive clamp on over-release.
func TestInflightLimiterBasics(t *testing.T) {
l := newInflightLimiter(100)
// Golden: a reservation within the cap is granted and reflected.
if !l.tryAcquire(60) {
t.Fatalf("acquire 60 within cap 100 should grant")
}
if l.inFlight() != 60 {
t.Fatalf("inFlight = %d, want 60", l.inFlight())
}
// Edge: exactly reaching the cap (60+40 == 100) is granted.
if !l.tryAcquire(40) {
t.Fatalf("acquire to the exact cap should grant")
}
if l.inFlight() != 100 {
t.Fatalf("inFlight = %d, want 100", l.inFlight())
}
// Error: one more byte over the full cap is refused, and the counter is left
// untouched (a refused reservation reserves nothing).
if l.tryAcquire(1) {
t.Fatalf("acquire over a full cap must be refused")
}
if l.inFlight() != 100 {
t.Fatalf("a refused acquire must not change inFlight; got %d", l.inFlight())
}
// Release frees capacity again.
l.release(100)
if l.inFlight() != 0 {
t.Fatalf("inFlight after full release = %d, want 0", l.inFlight())
}
// Defensive: an over-release never drives the counter negative.
l.release(50)
if l.inFlight() != 0 {
t.Fatalf("over-release must clamp at 0; got %d", l.inFlight())
}
}
// TestInflightLimiterDisabled verifies that a non-positive cap disables the
// limiter: every reservation is granted and nothing is tracked (the loopback/dev
// posture).
func TestInflightLimiterDisabled(t *testing.T) {
for _, max := range []int64{0, -1} {
l := newInflightLimiter(max)
if !l.tryAcquire(1 << 30) {
t.Fatalf("disabled limiter (max=%d) must always grant", max)
}
if l.inFlight() != 0 {
t.Fatalf("disabled limiter must not track usage; got %d", l.inFlight())
}
l.release(1 << 30) // no-op, must not panic
}
}
// TestInflightLimiterConcurrent hammers the limiter from many goroutines with
// equal-sized acquire/release pairs and asserts the invariant never breaks: the
// counter returns to 0 and never exceeds the cap. Run with -race for the memory
// model guarantee.
func TestInflightLimiterConcurrent(t *testing.T) {
const cap = 1000
const chunk = 7
l := newInflightLimiter(cap)
var wg sync.WaitGroup
for g := 0; g < 64; g++ {
wg.Add(1)
go func() {
defer wg.Done()
for i := 0; i < 2000; i++ {
if l.tryAcquire(chunk) {
if f := l.inFlight(); f > cap {
t.Errorf("inFlight %d exceeded cap %d", f, cap)
return
}
l.release(chunk)
}
}
}()
}
wg.Wait()
if l.inFlight() != 0 {
t.Fatalf("after all goroutines, inFlight = %d, want 0", l.inFlight())
}
}
+656
View File
@@ -0,0 +1,656 @@
package membership
// jetstreamStore is the JetStream KV implementation of Store (issue 0003b): the
// control-plane state (rooms, members, sealed room keys, the user allowlist)
// lives in replicated JetStream Key/Value buckets instead of a process-local
// SQLite file. Any node in the cluster reads and writes the same buckets, and
// JetStream's RAFT layer keeps them consistent across replicas, so the HTTP
// control plane becomes effectively stateless: any membershipd can serve any
// request. It is selected only when the `decentralized` flag is on; sqliteStore
// stays the default.
//
// Key layout (every path segment is a single KV token — ULIDs, RawURL endpoint
// ids and lowercase-hex keys never contain a '.', so '.' is a safe separator and
// a "<prefix>.*" watch enumerates exactly one trailing token):
//
// rooms roomID -> RoomInfo (JSON)
// members roomID.endpoint -> Member (JSON, carries Role)
// rooms_by_member endpoint.roomID -> role (reverse index for ListRoomsForEndpoint)
// room_keys roomID.endpoint.epoch -> sealed_key bytes
// users signPubHex -> User (JSON)
//
// Consistency caveat: KV has no multi-key transaction, so a multi-write op
// (CreateRoom, AddMember) is a short sequence of single-key writes. The order is
// chosen so a partial failure leaves a recoverable state (the room/member row
// before its reverse index or sealed key), and writes are idempotent (Put
// overwrites), which is also what makes the SQLite->KV migration (0003c) safe to
// re-run.
//
// Fail-closed: every read uses a bounded context, and IsAuthorized/HasAdmin
// return false on ANY backend error (a KV quorum loss or timeout denies access
// rather than admitting it), mirroring the SQLite store's behavior.
import (
"context"
"encoding/json"
"errors"
"fmt"
"sort"
"strconv"
"strings"
"time"
"github.com/nats-io/nats.go/jetstream"
)
// Bucket names (alphanumeric/dash/underscore only — no dots, per KV rules).
const (
bucketRooms = "UNIBUS_rooms"
bucketMembers = "UNIBUS_members"
bucketByMember = "UNIBUS_rooms_by_member"
bucketRoomKeys = "UNIBUS_room_keys"
bucketUsers = "UNIBUS_users"
defaultKVOpTime = 5 * time.Second
)
// JetStreamConfig configures the KV-backed store.
type JetStreamConfig struct {
// Replicas is the per-bucket replication factor (R1..R5). Use 1 for a single
// node or a 1-2 node rollout, 3 for real HA (quorum 2/3). Scaling R1->R3 in
// place is an operational step (nats kv update) done when the third node
// joins; it does not require reopening the store.
Replicas int
// OpTimeout bounds every KV operation so a stalled backend fails closed
// instead of hanging a request. Zero uses defaultKVOpTime.
OpTimeout time.Duration
}
type jetstreamStore struct {
rooms jetstream.KeyValue
members jetstream.KeyValue
byMember jetstream.KeyValue
keys jetstream.KeyValue
users jetstream.KeyValue
opTimeout time.Duration
}
// OpenJetStream creates (or opens) the five KV buckets on js with the configured
// replication factor and returns a Store backed by them. The JetStream context
// belongs to the caller (it owns the NATS connection); Close is a no-op.
func OpenJetStream(js jetstream.JetStream, cfg JetStreamConfig) (Store, error) {
if cfg.Replicas <= 0 {
cfg.Replicas = 1
}
opTimeout := cfg.OpTimeout
if opTimeout <= 0 {
opTimeout = defaultKVOpTime
}
// Bootstrap budget for creating/opening the buckets. On a single node JetStream
// is ready the instant the server starts, so the first attempt succeeds. On a
// COLD multi-node cluster the JetStream meta-group must first elect a leader and
// each node must establish contact with it before its $JS.API responds. A KV
// op is a NATS request/reply: if it is published before the node's JetStream is
// ready the request is dropped (not queued), and a single long-context call then
// just blocks until it times out (issue 0006g). So we RETRY each bucket op with
// short per-attempt contexts until it succeeds or the overall bootstrap budget
// is exhausted; once the cluster is ready the next retry lands and the buckets
// are created, after which they persist and every node opens them quickly.
bootstrapBudget := 120 * time.Second
deadline := time.Now().Add(bootstrapBudget)
s := &jetstreamStore{opTimeout: opTimeout}
for _, b := range []struct {
name string
dst *jetstream.KeyValue
}{
{bucketRooms, &s.rooms},
{bucketMembers, &s.members},
{bucketByMember, &s.byMember},
{bucketRoomKeys, &s.keys},
{bucketUsers, &s.users},
} {
var kv jetstream.KeyValue
var lastErr error
for {
opCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
kv, lastErr = js.CreateOrUpdateKeyValue(opCtx, jetstream.KeyValueConfig{
Bucket: b.name,
Replicas: cfg.Replicas,
History: 1,
Storage: jetstream.FileStorage,
})
cancel()
if lastErr == nil {
break
}
if time.Now().After(deadline) {
return nil, fmt.Errorf("membership: open KV bucket %q (replicas=%d) after %s: %w", b.name, cfg.Replicas, bootstrapBudget, lastErr)
}
// JetStream not ready yet (no meta leader / request dropped). Wait and
// re-publish the op; in a cluster cold start this lands once the meta
// group settles.
time.Sleep(1 * time.Second)
}
*b.dst = kv
}
return s, nil
}
// Close releases nothing: the JetStream context and NATS connection are owned by
// the caller, which closes them on shutdown.
func (s *jetstreamStore) Close() error { return nil }
func (s *jetstreamStore) ctx() (context.Context, context.CancelFunc) {
return context.WithTimeout(context.Background(), s.opTimeout)
}
// ---- key helpers ----------------------------------------------------------
func memberKey(roomID, endpoint string) string { return roomID + "." + endpoint }
func byMemberKey(endpoint, roomID string) string { return endpoint + "." + roomID }
func sealedKey(roomID, endpoint string, e int) string {
return roomID + "." + endpoint + "." + strconv.Itoa(e)
}
// watchEntries collects every current entry whose key matches pattern (a KV
// watch with a "<prefix>.*" wildcard), draining the watcher until the nil marker
// that signals "all initial values delivered". Tombstones are skipped.
func (s *jetstreamStore) watchEntries(kv jetstream.KeyValue, pattern string) ([]jetstream.KeyValueEntry, error) {
ctx, cancel := s.ctx()
defer cancel()
w, err := kv.Watch(ctx, pattern, jetstream.IgnoreDeletes())
if err != nil {
return nil, err
}
defer w.Stop()
var out []jetstream.KeyValueEntry
for {
select {
case e := <-w.Updates():
if e == nil {
return out, nil // initial snapshot complete
}
out = append(out, e)
case <-ctx.Done():
return nil, ctx.Err()
}
}
}
// ---- rooms / members / keys ----------------------------------------------
func (s *jetstreamStore) CreateRoom(info RoomInfo, ownerSignPub, ownerKexPub, ownerSealedKey []byte) error {
ctx, cancel := s.ctx()
defer cancel()
info.Epoch = 1
roomJSON, err := json.Marshal(info)
if err != nil {
return fmt.Errorf("membership: marshal room: %w", err)
}
// Create (not Put) so a duplicate room id is rejected, matching SQLite's
// PRIMARY KEY behavior.
if _, err := s.rooms.Create(ctx, info.RoomID, roomJSON); err != nil {
if errors.Is(err, jetstream.ErrKeyExists) {
return fmt.Errorf("membership: room %q already exists", info.RoomID)
}
return fmt.Errorf("membership: create room: %w", err)
}
owner := Member{Endpoint: info.OwnerEndpoint, Role: "owner", SignPub: ownerSignPub, KexPub: ownerKexPub}
if err := s.putMember(ctx, info.RoomID, owner); err != nil {
return err
}
if info.Encrypt {
if _, err := s.keys.Put(ctx, sealedKey(info.RoomID, info.OwnerEndpoint, 1), ownerSealedKey); err != nil {
return fmt.Errorf("membership: put owner key: %w", err)
}
}
return nil
}
// putMember writes the member row and its reverse index together.
func (s *jetstreamStore) putMember(ctx context.Context, roomID string, m Member) error {
mb, err := json.Marshal(m)
if err != nil {
return fmt.Errorf("membership: marshal member: %w", err)
}
if _, err := s.members.Put(ctx, memberKey(roomID, m.Endpoint), mb); err != nil {
return fmt.Errorf("membership: put member: %w", err)
}
if _, err := s.byMember.Put(ctx, byMemberKey(m.Endpoint, roomID), []byte(m.Role)); err != nil {
return fmt.Errorf("membership: put member index: %w", err)
}
return nil
}
func (s *jetstreamStore) GetRoom(roomID string) (RoomInfo, error) {
ctx, cancel := s.ctx()
defer cancel()
e, err := s.rooms.Get(ctx, roomID)
if err != nil {
if errors.Is(err, jetstream.ErrKeyNotFound) {
return RoomInfo{}, fmt.Errorf("membership: get room %q: %w", roomID, ErrNotFound)
}
return RoomInfo{}, fmt.Errorf("membership: get room %q: %w", roomID, err)
}
var info RoomInfo
if err := json.Unmarshal(e.Value(), &info); err != nil {
return RoomInfo{}, fmt.Errorf("membership: unmarshal room %q: %w", roomID, err)
}
return info, nil
}
func (s *jetstreamStore) AddMember(roomID string, m Member, epoch int, sealedKeyBytes []byte) error {
ctx, cancel := s.ctx()
defer cancel()
if err := s.putMember(ctx, roomID, m); err != nil {
return err
}
if len(sealedKeyBytes) > 0 {
if _, err := s.keys.Put(ctx, sealedKey(roomID, m.Endpoint, epoch), sealedKeyBytes); err != nil {
return fmt.Errorf("membership: put member key: %w", err)
}
}
return nil
}
func (s *jetstreamStore) GetMember(roomID, endpoint string) (Member, error) {
ctx, cancel := s.ctx()
defer cancel()
e, err := s.members.Get(ctx, memberKey(roomID, endpoint))
if err != nil {
if errors.Is(err, jetstream.ErrKeyNotFound) {
return Member{}, fmt.Errorf("membership: get member %q/%q: %w", roomID, endpoint, ErrNotFound)
}
return Member{}, fmt.Errorf("membership: get member %q/%q: %w", roomID, endpoint, err)
}
var m Member
if err := json.Unmarshal(e.Value(), &m); err != nil {
return Member{}, fmt.Errorf("membership: unmarshal member: %w", err)
}
return m, nil
}
func (s *jetstreamStore) ListMembers(roomID string) ([]Member, error) {
entries, err := s.watchEntries(s.members, roomID+".*")
if err != nil {
return nil, fmt.Errorf("membership: list members %q: %w", roomID, err)
}
out := make([]Member, 0, len(entries))
for _, e := range entries {
var m Member
if err := json.Unmarshal(e.Value(), &m); err != nil {
return nil, fmt.Errorf("membership: unmarshal member: %w", err)
}
out = append(out, m)
}
sort.Slice(out, func(i, j int) bool { return out[i].Endpoint < out[j].Endpoint })
return out, nil
}
func (s *jetstreamStore) ListRoomsForEndpoint(endpoint string) ([]RoomMembership, error) {
entries, err := s.watchEntries(s.byMember, endpoint+".*")
if err != nil {
return nil, fmt.Errorf("membership: list rooms for endpoint %q: %w", endpoint, err)
}
out := make([]RoomMembership, 0, len(entries))
for _, e := range entries {
// Key is "<endpoint>.<roomID>"; the roomID is everything after the dot.
roomID := e.Key()[len(endpoint)+1:]
info, err := s.GetRoom(roomID)
if err != nil {
if errors.Is(err, ErrNotFound) {
continue // index points at a removed room: skip, stay consistent
}
return nil, err
}
out = append(out, RoomMembership{RoomInfo: info, Role: string(e.Value())})
}
sort.Slice(out, func(i, j int) bool { return out[i].RoomID < out[j].RoomID })
return out, nil
}
func (s *jetstreamStore) GetSealedKey(roomID, endpoint string, epoch int) (int, []byte, error) {
if epoch > 0 {
ctx, cancel := s.ctx()
defer cancel()
e, err := s.keys.Get(ctx, sealedKey(roomID, endpoint, epoch))
if err != nil {
if errors.Is(err, jetstream.ErrKeyNotFound) {
return 0, nil, fmt.Errorf("membership: get sealed key %q/%q@%d: %w", roomID, endpoint, epoch, ErrNotFound)
}
return 0, nil, fmt.Errorf("membership: get sealed key %q/%q@%d: %w", roomID, endpoint, epoch, err)
}
return epoch, e.Value(), nil
}
// epoch <= 0: latest. Enumerate "<roomID>.<endpoint>.*" and take the max.
entries, err := s.watchEntries(s.keys, roomID+"."+endpoint+".*")
if err != nil {
return 0, nil, fmt.Errorf("membership: get latest sealed key %q/%q: %w", roomID, endpoint, err)
}
bestEpoch, bestVal := -1, []byte(nil)
for _, e := range entries {
k := e.Key()
ep, perr := strconv.Atoi(k[len(roomID)+1+len(endpoint)+1:])
if perr != nil {
continue
}
if ep > bestEpoch {
bestEpoch, bestVal = ep, e.Value()
}
}
if bestEpoch < 0 {
return 0, nil, fmt.Errorf("membership: get latest sealed key %q/%q: %w", roomID, endpoint, ErrNotFound)
}
return bestEpoch, bestVal, nil
}
func (s *jetstreamStore) PutSealedKeys(roomID string, epoch int, keys map[string][]byte) error {
ctx, cancel := s.ctx()
defer cancel()
for endpoint, sealed := range keys {
if _, err := s.keys.Put(ctx, sealedKey(roomID, endpoint, epoch), sealed); err != nil {
return fmt.Errorf("membership: put sealed key for %q: %w", endpoint, err)
}
}
return nil
}
func (s *jetstreamStore) BumpEpoch(roomID string, newEpoch int) error {
// Read-modify-write the room's epoch. The control plane serializes rekeys per
// room (owner-signed), so the lost-update window is not exercised in practice.
info, err := s.GetRoom(roomID)
if err != nil {
return fmt.Errorf("membership: bump epoch %q->%d: %w", roomID, newEpoch, err)
}
info.Epoch = newEpoch
b, err := json.Marshal(info)
if err != nil {
return fmt.Errorf("membership: marshal room: %w", err)
}
ctx, cancel := s.ctx()
defer cancel()
if _, err := s.rooms.Put(ctx, roomID, b); err != nil {
return fmt.Errorf("membership: bump epoch %q->%d: %w", roomID, newEpoch, err)
}
return nil
}
func (s *jetstreamStore) RemoveMember(roomID, endpoint string) error {
ctx, cancel := s.ctx()
defer cancel()
// Drop the member row and its reverse index. Past-epoch sealed keys are left
// intact (they only decrypt data the member could already read), matching the
// SQLite store.
if err := s.members.Delete(ctx, memberKey(roomID, endpoint)); err != nil && !errors.Is(err, jetstream.ErrKeyNotFound) {
return fmt.Errorf("membership: remove member %q/%q: %w", roomID, endpoint, err)
}
if err := s.byMember.Delete(ctx, byMemberKey(endpoint, roomID)); err != nil && !errors.Is(err, jetstream.ErrKeyNotFound) {
return fmt.Errorf("membership: remove member index %q/%q: %w", roomID, endpoint, err)
}
return nil
}
// ---- users (the bus allowlist) -------------------------------------------
func (s *jetstreamStore) AddUser(signPub, handle, role string) error {
signPub = normalizeSignPub(signPub)
if signPub == "" || handle == "" {
return fmt.Errorf("membership: AddUser: sign_pub and handle required")
}
if role == "" {
role = RoleMember
}
if role != RoleAdmin && role != RoleMember {
return fmt.Errorf("membership: AddUser: invalid role %q (want %q or %q)", role, RoleAdmin, RoleMember)
}
u := User{SignPub: signPub, Handle: handle, Role: role, Status: StatusActive, CreatedAt: nowRFC3339()}
b, err := json.Marshal(u)
if err != nil {
return fmt.Errorf("membership: marshal user: %w", err)
}
ctx, cancel := s.ctx()
defer cancel()
if _, err := s.users.Create(ctx, signPub, b); err != nil {
if errors.Is(err, jetstream.ErrKeyExists) {
return ErrUserExists
}
return fmt.Errorf("membership: insert user: %w", err)
}
return nil
}
func (s *jetstreamStore) GetUser(signPub string) (User, error) {
signPub = normalizeSignPub(signPub)
ctx, cancel := s.ctx()
defer cancel()
e, err := s.users.Get(ctx, signPub)
if err != nil {
if errors.Is(err, jetstream.ErrKeyNotFound) {
return User{}, fmt.Errorf("membership: get user %q: %w", signPub, ErrNotFound)
}
return User{}, fmt.Errorf("membership: get user %q: %w", signPub, err)
}
var u User
if err := json.Unmarshal(e.Value(), &u); err != nil {
return User{}, fmt.Errorf("membership: unmarshal user: %w", err)
}
return u, nil
}
func (s *jetstreamStore) ListUsers() ([]User, error) {
ctx, cancel := s.ctx()
w, err := s.users.WatchAll(ctx, jetstream.IgnoreDeletes())
if err != nil {
cancel()
return nil, fmt.Errorf("membership: list users: %w", err)
}
defer cancel()
defer w.Stop()
var out []User
for {
select {
case e := <-w.Updates():
if e == nil {
sort.Slice(out, func(i, j int) bool {
if out[i].Handle != out[j].Handle {
return out[i].Handle < out[j].Handle
}
return out[i].SignPub < out[j].SignPub
})
return out, nil
}
var u User
if err := json.Unmarshal(e.Value(), &u); err != nil {
return nil, fmt.Errorf("membership: unmarshal user: %w", err)
}
out = append(out, u)
case <-ctx.Done():
return nil, ctx.Err()
}
}
}
func (s *jetstreamStore) RevokeUser(signPub string) error {
signPub = normalizeSignPub(signPub)
u, err := s.GetUser(signPub)
if err != nil {
if errors.Is(err, ErrNotFound) {
return fmt.Errorf("membership: revoke user %q: no active user with that key", signPub)
}
return fmt.Errorf("membership: revoke user %q: %w", signPub, err)
}
if u.Status != StatusActive {
return fmt.Errorf("membership: revoke user %q: no active user with that key", signPub)
}
u.Status = StatusRevoked
u.RevokedAt = nowRFC3339()
b, err := json.Marshal(u)
if err != nil {
return fmt.Errorf("membership: marshal user: %w", err)
}
ctx, cancel := s.ctx()
defer cancel()
if _, err := s.users.Put(ctx, signPub, b); err != nil {
return fmt.Errorf("membership: revoke user %q: %w", signPub, err)
}
return nil
}
// IsAuthorized reports whether signPub is an active bus user. Any backend error
// (including a KV quorum loss or timeout) yields false: fail closed.
func (s *jetstreamStore) IsAuthorized(signPub string) bool {
signPub = normalizeSignPub(signPub)
if signPub == "" {
return false
}
ctx, cancel := s.ctx()
defer cancel()
e, err := s.users.Get(ctx, signPub)
if err != nil {
return false
}
var u User
if err := json.Unmarshal(e.Value(), &u); err != nil {
return false
}
return u.Status == StatusActive
}
// HasAdmin reports whether at least one active admin exists. On any backend
// error it returns false, keeping the admin-gated endpoints closed (conservative).
func (s *jetstreamStore) HasAdmin() bool {
users, err := s.ListUsers()
if err != nil {
return false
}
for _, u := range users {
if u.Role == RoleAdmin && u.Status == StatusActive {
return true
}
}
return false
}
// ---- snapshot import / export (issue 0003c migration) ---------------------
// importSnapshot writes a full Snapshot into the KV buckets, preserving each
// room's epoch and each user's status (Put, not CreateRoom/AddUser, so the exact
// state is reproduced rather than reset to defaults). Idempotent: every write is
// an overwrite, so re-running the migration converges.
func (s *jetstreamStore) importSnapshot(snap *Snapshot) error {
ctx, cancel := s.ctx()
defer cancel()
for _, r := range snap.Rooms {
b, err := json.Marshal(r)
if err != nil {
return fmt.Errorf("import: marshal room %q: %w", r.RoomID, err)
}
if _, err := s.rooms.Put(ctx, r.RoomID, b); err != nil {
return fmt.Errorf("import: put room %q: %w", r.RoomID, err)
}
}
for roomID, members := range snap.Members {
for _, m := range members {
if err := s.putMember(ctx, roomID, m); err != nil {
return fmt.Errorf("import: %w", err)
}
}
}
for _, rec := range snap.Keys {
if _, err := s.keys.Put(ctx, sealedKey(rec.RoomID, rec.Endpoint, rec.Epoch), rec.Sealed); err != nil {
return fmt.Errorf("import: put key %q/%q@%d: %w", rec.RoomID, rec.Endpoint, rec.Epoch, err)
}
}
for _, u := range snap.Users {
b, err := json.Marshal(u)
if err != nil {
return fmt.Errorf("import: marshal user %q: %w", u.SignPub, err)
}
if _, err := s.users.Put(ctx, normalizeSignPub(u.SignPub), b); err != nil {
return fmt.Errorf("import: put user %q: %w", u.SignPub, err)
}
}
return nil
}
// ExportSnapshot reads the entire KV control-plane state back into a Snapshot,
// so the migration's parity test can compare it against the SQLite source.
func (s *jetstreamStore) ExportSnapshot() (*Snapshot, error) {
snap := &Snapshot{Members: map[string][]Member{}}
roomEntries, err := s.watchAll(s.rooms)
if err != nil {
return nil, fmt.Errorf("export kv: rooms: %w", err)
}
for _, e := range roomEntries {
var r RoomInfo
if err := json.Unmarshal(e.Value(), &r); err != nil {
return nil, fmt.Errorf("export kv: unmarshal room: %w", err)
}
snap.Rooms = append(snap.Rooms, r)
}
memberEntries, err := s.watchAll(s.members)
if err != nil {
return nil, fmt.Errorf("export kv: members: %w", err)
}
for _, e := range memberEntries {
// Key is "<roomID>.<endpoint>"; neither segment contains a dot.
roomID := strings.SplitN(e.Key(), ".", 2)[0]
var m Member
if err := json.Unmarshal(e.Value(), &m); err != nil {
return nil, fmt.Errorf("export kv: unmarshal member: %w", err)
}
snap.Members[roomID] = append(snap.Members[roomID], m)
}
keyEntries, err := s.watchAll(s.keys)
if err != nil {
return nil, fmt.Errorf("export kv: keys: %w", err)
}
for _, e := range keyEntries {
// Key is "<roomID>.<endpoint>.<epoch>".
parts := strings.Split(e.Key(), ".")
if len(parts) != 3 {
continue
}
epoch, err := strconv.Atoi(parts[2])
if err != nil {
continue
}
snap.Keys = append(snap.Keys, SealedKeyRecord{RoomID: parts[0], Endpoint: parts[1], Epoch: epoch, Sealed: e.Value()})
}
users, err := s.ListUsers()
if err != nil {
return nil, fmt.Errorf("export kv: users: %w", err)
}
snap.Users = users
return snap, nil
}
// watchAll collects every current entry of a bucket (no key filter), draining
// the watcher to its initial-snapshot nil marker.
func (s *jetstreamStore) watchAll(kv jetstream.KeyValue) ([]jetstream.KeyValueEntry, error) {
ctx, cancel := s.ctx()
defer cancel()
w, err := kv.WatchAll(ctx, jetstream.IgnoreDeletes())
if err != nil {
return nil, err
}
defer w.Stop()
var out []jetstream.KeyValueEntry
for {
select {
case e := <-w.Updates():
if e == nil {
return out, nil
}
out = append(out, e)
case <-ctx.Done():
return nil, ctx.Err()
}
}
}
+275
View File
@@ -0,0 +1,275 @@
package membership
import (
"bytes"
"errors"
"net"
"testing"
"time"
"github.com/enmanuel/unibus/pkg/embeddednats"
"github.com/nats-io/nats.go"
"github.com/nats-io/nats.go/jetstream"
server "github.com/nats-io/nats-server/v2/server"
)
func kvFreePort(t *testing.T) int {
t.Helper()
l, err := net.Listen("tcp", "127.0.0.1:0")
if err != nil {
t.Fatalf("free port: %v", err)
}
defer l.Close()
return l.Addr().(*net.TCPAddr).Port
}
// newKVStore boots a single-node embedded NATS with JetStream and opens a
// jetstreamStore (R1) over it, returning the store plus the server and
// connection so a test can shut the backend down to exercise fail-closed paths.
func newKVStore(t *testing.T) (*jetstreamStore, *server.Server, *nats.Conn) {
t.Helper()
ns, err := embeddednats.StartServer(embeddednats.ServerConfig{
StoreDir: t.TempDir(),
Host: "127.0.0.1",
Port: kvFreePort(t),
})
if err != nil {
t.Fatalf("embedded nats: %v", err)
}
nc, err := nats.Connect(ns.ClientURL())
if err != nil {
ns.Shutdown()
t.Fatalf("nats connect: %v", err)
}
js, err := jetstream.New(nc)
if err != nil {
nc.Close()
ns.Shutdown()
t.Fatalf("jetstream: %v", err)
}
st, err := OpenJetStream(js, JetStreamConfig{Replicas: 1, OpTimeout: 2 * time.Second})
if err != nil {
nc.Close()
ns.Shutdown()
t.Fatalf("open jetstream store: %v", err)
}
t.Cleanup(func() {
nc.Close()
ns.Shutdown()
ns.WaitForShutdown()
})
return st.(*jetstreamStore), ns, nc
}
// TestJetStreamStoreRoomsCRUD is the golden path: an encrypted room with an owner
// and an invited member round-trips through every room/member/key method.
func TestJetStreamStoreRoomsCRUD(t *testing.T) {
s, _, _ := newKVStore(t)
roomID := newULID()
owner := "owner-ep-1"
info := RoomInfo{RoomID: roomID, Subject: "room.kv", Encrypt: true, Persist: true, SignMsgs: true, OwnerEndpoint: owner}
ownerSealed := []byte("sealed-owner-epoch1")
if err := s.CreateRoom(info, []byte("owner-sign"), []byte("owner-kex"), ownerSealed); err != nil {
t.Fatalf("CreateRoom: %v", err)
}
// GetRoom returns epoch 1 and the policy.
got, err := s.GetRoom(roomID)
if err != nil {
t.Fatalf("GetRoom: %v", err)
}
if got.Epoch != 1 || got.Subject != "room.kv" || !got.Encrypt || got.OwnerEndpoint != owner {
t.Fatalf("GetRoom mismatch: %+v", got)
}
// Owner is a member with role "owner".
om, err := s.GetMember(roomID, owner)
if err != nil {
t.Fatalf("GetMember owner: %v", err)
}
if om.Role != "owner" || !bytes.Equal(om.SignPub, []byte("owner-sign")) {
t.Fatalf("owner member mismatch: %+v", om)
}
// Owner's sealed key at epoch 1.
ep, sealed, err := s.GetSealedKey(roomID, owner, 1)
if err != nil || ep != 1 || !bytes.Equal(sealed, ownerSealed) {
t.Fatalf("GetSealedKey owner: ep=%d sealed=%q err=%v", ep, sealed, err)
}
// Invite a member with a sealed key at epoch 1.
bob := "member-ep-bob"
bobSealed := []byte("sealed-bob-epoch1")
if err := s.AddMember(roomID, Member{Endpoint: bob, Role: "member", SignPub: []byte("bob-sign"), KexPub: []byte("bob-kex")}, 1, bobSealed); err != nil {
t.Fatalf("AddMember: %v", err)
}
// ListMembers returns both, sorted by endpoint.
members, err := s.ListMembers(roomID)
if err != nil {
t.Fatalf("ListMembers: %v", err)
}
if len(members) != 2 {
t.Fatalf("ListMembers want 2, got %d (%+v)", len(members), members)
}
// Bob can find the room via the reverse index.
rooms, err := s.ListRoomsForEndpoint(bob)
if err != nil {
t.Fatalf("ListRoomsForEndpoint: %v", err)
}
if len(rooms) != 1 || rooms[0].RoomID != roomID || rooms[0].Role != "member" {
t.Fatalf("ListRoomsForEndpoint mismatch: %+v", rooms)
}
// Latest sealed key (epoch <= 0) resolves to epoch 1 for bob.
lep, lsealed, err := s.GetSealedKey(roomID, bob, 0)
if err != nil || lep != 1 || !bytes.Equal(lsealed, bobSealed) {
t.Fatalf("GetSealedKey latest bob: ep=%d err=%v", lep, err)
}
// Rekey to epoch 2 (bump + new sealed keys), then latest resolves to 2.
if err := s.BumpEpoch(roomID, 2); err != nil {
t.Fatalf("BumpEpoch: %v", err)
}
if err := s.PutSealedKeys(roomID, 2, map[string][]byte{owner: []byte("owner-epoch2")}); err != nil {
t.Fatalf("PutSealedKeys: %v", err)
}
got2, _ := s.GetRoom(roomID)
if got2.Epoch != 2 {
t.Fatalf("after BumpEpoch want epoch 2, got %d", got2.Epoch)
}
lep2, _, err := s.GetSealedKey(roomID, owner, 0)
if err != nil || lep2 != 2 {
t.Fatalf("latest owner key after rekey: ep=%d err=%v", lep2, err)
}
// Remove bob; he disappears from members and his reverse index.
if err := s.RemoveMember(roomID, bob); err != nil {
t.Fatalf("RemoveMember: %v", err)
}
if _, err := s.GetMember(roomID, bob); !errors.Is(err, ErrNotFound) {
t.Fatalf("GetMember after remove want ErrNotFound, got %v", err)
}
rooms2, _ := s.ListRoomsForEndpoint(bob)
if len(rooms2) != 0 {
t.Fatalf("ListRoomsForEndpoint after remove want 0, got %d", len(rooms2))
}
}
// TestJetStreamStoreUsers exercises the allowlist: add, lookup, authorize,
// revoke (which flips IsAuthorized), and the admin gate.
func TestJetStreamStoreUsers(t *testing.T) {
s, _, _ := newKVStore(t)
const aliceHex = "aa11"
if s.HasAdmin() {
t.Fatalf("fresh store should have no admin")
}
if err := s.AddUser(aliceHex, "alice", RoleAdmin); err != nil {
t.Fatalf("AddUser: %v", err)
}
if !s.HasAdmin() {
t.Fatalf("HasAdmin should be true after adding an admin")
}
if !s.IsAuthorized(aliceHex) {
t.Fatalf("alice should be authorized")
}
// Case-insensitive lookup (keys are normalized lowercase).
if !s.IsAuthorized("AA11") {
t.Fatalf("uppercase hex should normalize and authorize")
}
u, err := s.GetUser(aliceHex)
if err != nil || u.Handle != "alice" || u.Role != RoleAdmin || u.Status != StatusActive {
t.Fatalf("GetUser mismatch: %+v err=%v", u, err)
}
// Duplicate add is rejected with ErrUserExists.
if err := s.AddUser(aliceHex, "alice2", RoleMember); !errors.Is(err, ErrUserExists) {
t.Fatalf("duplicate AddUser want ErrUserExists, got %v", err)
}
if err := s.AddUser("bb22", "bob", RoleMember); err != nil {
t.Fatalf("AddUser bob: %v", err)
}
users, err := s.ListUsers()
if err != nil || len(users) != 2 {
t.Fatalf("ListUsers want 2, got %d err=%v", len(users), err)
}
// Revoke alice: authorization flips off immediately.
if err := s.RevokeUser(aliceHex); err != nil {
t.Fatalf("RevokeUser: %v", err)
}
if s.IsAuthorized(aliceHex) {
t.Fatalf("revoked user must not be authorized")
}
if s.HasAdmin() {
t.Fatalf("after revoking the only admin, HasAdmin must be false")
}
// Revoking again is an error (no active user).
if err := s.RevokeUser(aliceHex); err == nil {
t.Fatalf("re-revoke should error")
}
}
// TestJetStreamStoreNotFound checks the ErrNotFound mapping for misses.
func TestJetStreamStoreNotFound(t *testing.T) {
s, _, _ := newKVStore(t)
if _, err := s.GetRoom("nope"); !errors.Is(err, ErrNotFound) {
t.Fatalf("GetRoom miss want ErrNotFound, got %v", err)
}
if _, err := s.GetMember("nope", "x"); !errors.Is(err, ErrNotFound) {
t.Fatalf("GetMember miss want ErrNotFound, got %v", err)
}
if _, _, err := s.GetSealedKey("nope", "x", 1); !errors.Is(err, ErrNotFound) {
t.Fatalf("GetSealedKey miss want ErrNotFound, got %v", err)
}
if _, _, err := s.GetSealedKey("nope", "x", 0); !errors.Is(err, ErrNotFound) {
t.Fatalf("GetSealedKey latest miss want ErrNotFound, got %v", err)
}
if _, err := s.GetUser("ffff"); !errors.Is(err, ErrNotFound) {
t.Fatalf("GetUser miss want ErrNotFound, got %v", err)
}
}
// TestJetStreamStoreIsAuthorizedFailClosed is the error path mandated by the
// issue: when the KV backend is unavailable (here the NATS server is shut down),
// IsAuthorized must DENY, never admit. A previously-authorized identity flips to
// unauthorized once the backend cannot be reached.
func TestJetStreamStoreIsAuthorizedFailClosed(t *testing.T) {
s, ns, nc := newKVStore(t)
const aliceHex = "abcd"
if err := s.AddUser(aliceHex, "alice", RoleAdmin); err != nil {
t.Fatalf("AddUser: %v", err)
}
if !s.IsAuthorized(aliceHex) {
t.Fatalf("alice should be authorized while the backend is up")
}
// Take the KV backend away: close the client and stop the server. Every
// subsequent KV Get fails, and the store must fail closed.
nc.Close()
ns.Shutdown()
ns.WaitForShutdown()
// Bound the assertion: IsAuthorized internally caps each op at OpTimeout, so
// this returns well before the test deadline.
done := make(chan bool, 1)
go func() { done <- s.IsAuthorized(aliceHex) }()
select {
case authorized := <-done:
if authorized {
t.Fatalf("KV backend down but IsAuthorized returned true: NOT fail-closed")
}
case <-time.After(10 * time.Second):
t.Fatalf("IsAuthorized hung when the backend was down (no bounded timeout)")
}
// HasAdmin is likewise conservative: backend down -> false (gates stay closed).
if s.HasAdmin() {
t.Fatalf("KV backend down but HasAdmin returned true: NOT fail-closed")
}
}
+152
View File
@@ -0,0 +1,152 @@
package membership_test
// Regression for audit report 0008, vector N2: with the broad "$JS.API.>" grant
// removed (issue 0006b), a registered peer that belongs to no room can no longer
// read the control-plane KV buckets over NATS, while the per-room JetStream API of
// a peer's OWN rooms keeps working. The auditor's ephemeral attack populated the
// KV control plane and had a registered non-member harvest the allowlist, the room
// graph and the sealed-key metadata directly through "$JS.API.>".
import (
"context"
"encoding/hex"
"path/filepath"
"testing"
"time"
"github.com/enmanuel/unibus/pkg/busauth"
"github.com/enmanuel/unibus/pkg/embeddednats"
"github.com/enmanuel/unibus/pkg/frame"
"github.com/enmanuel/unibus/pkg/membership"
"github.com/nats-io/nats.go"
"github.com/nats-io/nats.go/jetstream"
server "github.com/nats-io/nats-server/v2/server"
)
// startACLNatsInternal is startACLNats plus a recognized internal service identity
// (so the test can seed the KV control plane with full permissions, exactly as the
// decentralized membershipd does at bootstrap).
func startACLNatsInternal(t *testing.T, store membership.Store, internalPubHex string) *server.Server {
t.Helper()
auth := busauth.NewNkeyAuthenticatorACLInternal(store.IsAuthorized, aclPermsFunc(store), internalPubHex)
ns, err := embeddednats.StartServer(embeddednats.ServerConfig{
StoreDir: t.TempDir(), Host: "127.0.0.1", Port: aclFreePort(t), Auth: auth,
})
if err != nil {
t.Fatalf("acl nats: %v", err)
}
t.Cleanup(func() { ns.Shutdown(); ns.WaitForShutdown() })
return ns
}
// TestAttack0008_N2 reproduces the control-plane KV leak and proves it is closed.
//
// error : eve (registered, member of no room) cannot read the KV buckets — the
// JetStream KV API and the raw $KV subject space are both denied.
// golden: the owner of a persisted room can still drive the JetStream API of HER
// OWN room's stream (so persisted-room history keeps working).
// edge : eve cannot reach another room's stream API either (cross-room JS deny).
func TestAttack0008_N2(t *testing.T) {
dir := t.TempDir()
// The HTTP control-plane store stays SQLite; the KV buckets below stand in for
// the decentralized control plane the attack targets.
store, err := membership.Open(filepath.Join(dir, "unibus.db"))
if err != nil {
t.Fatalf("store: %v", err)
}
t.Cleanup(func() { store.Close() })
ceo, eve, internalID := mustID(t), mustID(t), mustID(t)
ceoEP := frame.EndpointID(ceo.SignPub)
mustAddUser(t, store, ceo, "ceo-root-admin")
mustAddUser(t, store, eve, "eve") // registered, member of nothing
// A persisted room owned by ceo: ceo is a member, so her per-room JS is allowed.
if err := store.CreateRoom(
membership.RoomInfo{RoomID: "PRIVROOM", Subject: "room.board.ma-deal", Encrypt: true, Persist: true, OwnerEndpoint: ceoEP},
ceo.SignPub, ceo.KexPub, []byte("sealed-self"),
); err != nil {
t.Fatalf("create room: %v", err)
}
internalPubHex := hex.EncodeToString(internalID.SignPub)
ns := startACLNatsInternal(t, store, internalPubHex)
url := ns.ClientURL()
// Seed the KV control plane with the privileged internal identity (full perms),
// simulating the decentralized buckets the attack reads.
intErr := make(chan error, 4)
intNC := nkeyConn(t, url, internalID, intErr)
intJS, err := jetstream.New(intNC)
if err != nil {
t.Fatalf("internal jetstream: %v", err)
}
kvStore, err := membership.OpenJetStream(intJS, membership.JetStreamConfig{Replicas: 1, OpTimeout: 3 * time.Second})
if err != nil {
t.Fatalf("open kv buckets: %v", err)
}
if err := kvStore.AddUser(hex.EncodeToString(ceo.SignPub), "ceo-root-admin", membership.RoleAdmin); err != nil {
t.Fatalf("seed kv user: %v", err)
}
// Each JetStream op gets its own short context: a DENIED request never gets a
// reply, so it blocks until its own deadline — a shared context would be
// exhausted by the first denied call and starve the rest.
freshCtx := func(d time.Duration) (context.Context, context.CancelFunc) {
return context.WithTimeout(context.Background(), d)
}
// --- error: eve cannot read the control-plane KV buckets ------------------
eveErr := make(chan error, 8)
eveNC := nkeyConn(t, url, eve, eveErr)
eveJS, err := jetstream.New(eveNC)
if err != nil {
t.Fatalf("eve jetstream: %v", err)
}
// (a) The KV API: binding the bucket requires STREAM.INFO.KV_UNIBUS_users, which
// eve has no permission for, so this must fail (no leak of users).
kvCtx, kvCancel := freshCtx(2 * time.Second)
if kv, err := eveJS.KeyValue(kvCtx, "UNIBUS_users"); err == nil {
if e, gerr := kv.Get(kvCtx, hex.EncodeToString(ceo.SignPub)); gerr == nil {
kvCancel()
t.Fatalf("eve read the control-plane KV users bucket: %q (N2 leak still open)", string(e.Value()))
}
kvCancel()
t.Fatalf("eve was able to BIND the KV users bucket (N2 leak still open)")
}
kvCancel()
// (b) The raw KV subject space: a direct subscribe must be a permissions
// violation (delivered async to the error handler).
drain(eveErr)
if _, err := eveNC.Subscribe("$KV.UNIBUS_users.>", func(*nats.Msg) {}); err != nil {
t.Fatalf("eve sub $KV: %v", err)
}
_ = eveNC.Flush()
if e := waitErr(eveErr, 1*time.Second); e == nil {
t.Fatalf("eve subscribing to $KV.UNIBUS_users.> must raise a permissions violation")
}
// --- edge: eve cannot reach another room's stream API ---------------------
edgeCtx, edgeCancel := freshCtx(2 * time.Second)
if _, err := eveJS.Stream(edgeCtx, "UNIBUS_PRIVROOM"); err == nil {
edgeCancel()
t.Fatalf("eve reached the foreign room stream API (cross-room JS not isolated)")
}
edgeCancel()
// --- golden: ceo can drive the JetStream API of HER OWN room's stream ------
ceoErr := make(chan error, 4)
ceoNC := nkeyConn(t, url, ceo, ceoErr)
ceoJS, err := jetstream.New(ceoNC)
if err != nil {
t.Fatalf("ceo jetstream: %v", err)
}
goldenCtx, goldenCancel := freshCtx(5 * time.Second)
defer goldenCancel()
if _, err := ceoJS.CreateOrUpdateStream(goldenCtx, jetstream.StreamConfig{
Name: "UNIBUS_PRIVROOM",
Subjects: []string{"room.board.ma-deal"},
Storage: jetstream.FileStorage,
}); err != nil {
t.Fatalf("ceo could not manage her OWN room stream (per-room JS broken): %v", err)
}
}
+176
View File
@@ -0,0 +1,176 @@
package membership
// Migration from the local SQLite control plane to replicated JetStream KV
// (issue 0003c). It is the one-time, idempotent data move that decentralization
// needs: read the entire SQLite state, write it into the KV buckets. Re-running
// it is safe (every KV write is an overwrite), so a partial/interrupted run is
// recovered by running again, and a parity test can assert the two stores hold
// the same state before and after.
import (
"database/sql"
"fmt"
"strings"
"time"
"github.com/nats-io/nats.go/jetstream"
)
// SealedKeyRecord is one row of room_keys: the sealed room key for an endpoint
// at a given epoch. It is the unit the snapshot carries so a backend can be
// imported with the exact epoch history (CreateRoom/AddMember alone could not
// reproduce a multi-epoch room).
type SealedKeyRecord struct {
RoomID string
Endpoint string
Epoch int
Sealed []byte
}
// Snapshot is the complete control-plane state, backend-agnostic. It is what
// ExportSnapshot produces and importSnapshot consumes, so the SQLite->KV
// migration and the parity test both work in terms of it.
type Snapshot struct {
Rooms []RoomInfo
Members map[string][]Member // roomID -> members
Keys []SealedKeyRecord
Users []User
}
// MigrateReport summarizes what a migration moved, for the operator log.
type MigrateReport struct {
BackupPath string
Rooms int
Members int
Keys int
Users int
}
// MigrateSQLiteToKV reads the SQLite store at sqlitePath and writes its entire
// state into the JetStream KV buckets on js (created with cfg.Replicas). It is
// idempotent: re-running converges to the same state. The caller is responsible
// for backing up the SQLite file first (BackupSQLite) — this function only
// reads it.
func MigrateSQLiteToKV(sqlitePath string, js jetstream.JetStream, cfg JetStreamConfig) (*MigrateReport, error) {
src, err := openSQLite(sqlitePath)
if err != nil {
return nil, fmt.Errorf("migrate: open sqlite %q: %w", sqlitePath, err)
}
defer src.Close()
snap, err := src.ExportSnapshot()
if err != nil {
return nil, fmt.Errorf("migrate: export sqlite: %w", err)
}
dst, err := OpenJetStream(js, cfg)
if err != nil {
return nil, fmt.Errorf("migrate: open kv: %w", err)
}
kv := dst.(*jetstreamStore)
if err := kv.importSnapshot(snap); err != nil {
return nil, fmt.Errorf("migrate: import to kv: %w", err)
}
members := 0
for _, ms := range snap.Members {
members += len(ms)
}
return &MigrateReport{
Rooms: len(snap.Rooms),
Members: members,
Keys: len(snap.Keys),
Users: len(snap.Users),
}, nil
}
// BackupSQLite makes a consistent copy of the SQLite database next to it,
// named "<path>.bak.<unixnano>", using SQLite's own VACUUM INTO (which writes a
// transactionally-consistent snapshot even with a live WAL). It returns the
// backup path. Always call this before MigrateSQLiteToKV so a botched migration
// can be undone.
func BackupSQLite(path string) (string, error) {
dst := fmt.Sprintf("%s.bak.%d", path, time.Now().UnixNano())
db, err := sql.Open("sqlite", "file:"+path+"?_pragma=busy_timeout(5000)")
if err != nil {
return "", fmt.Errorf("backup: open %q: %w", path, err)
}
defer db.Close()
if err := db.Ping(); err != nil {
return "", fmt.Errorf("backup: ping %q: %w", path, err)
}
// VACUUM INTO writes a fresh, consistent database file; the literal path is
// safely single-quoted (it is operator-supplied, never network input).
if _, err := db.Exec("VACUUM INTO '" + strings.ReplaceAll(dst, "'", "''") + "'"); err != nil {
return "", fmt.Errorf("backup: VACUUM INTO %q: %w", dst, err)
}
return dst, nil
}
// ---- SQLite export --------------------------------------------------------
// ExportSnapshot reads the entire SQLite control-plane state into a Snapshot.
func (s *sqliteStore) ExportSnapshot() (*Snapshot, error) {
snap := &Snapshot{Members: map[string][]Member{}}
rows, err := s.db.Query(`SELECT room_id, subject, key_epoch, encrypt, persist, sign_msgs, owner_endpoint FROM rooms ORDER BY room_id`)
if err != nil {
return nil, fmt.Errorf("export: query rooms: %w", err)
}
for rows.Next() {
var r RoomInfo
var enc, per, sgn int
if err := rows.Scan(&r.RoomID, &r.Subject, &r.Epoch, &enc, &per, &sgn, &r.OwnerEndpoint); err != nil {
rows.Close()
return nil, fmt.Errorf("export: scan room: %w", err)
}
r.Encrypt, r.Persist, r.SignMsgs = enc != 0, per != 0, sgn != 0
snap.Rooms = append(snap.Rooms, r)
}
rows.Close()
if err := rows.Err(); err != nil {
return nil, err
}
mrows, err := s.db.Query(`SELECT room_id, endpoint, role, sign_pub, kex_pub FROM members ORDER BY room_id, endpoint`)
if err != nil {
return nil, fmt.Errorf("export: query members: %w", err)
}
for mrows.Next() {
var roomID string
var m Member
if err := mrows.Scan(&roomID, &m.Endpoint, &m.Role, &m.SignPub, &m.KexPub); err != nil {
mrows.Close()
return nil, fmt.Errorf("export: scan member: %w", err)
}
snap.Members[roomID] = append(snap.Members[roomID], m)
}
mrows.Close()
if err := mrows.Err(); err != nil {
return nil, err
}
krows, err := s.db.Query(`SELECT room_id, epoch, endpoint, sealed_key FROM room_keys ORDER BY room_id, endpoint, epoch`)
if err != nil {
return nil, fmt.Errorf("export: query room_keys: %w", err)
}
for krows.Next() {
var rec SealedKeyRecord
if err := krows.Scan(&rec.RoomID, &rec.Epoch, &rec.Endpoint, &rec.Sealed); err != nil {
krows.Close()
return nil, fmt.Errorf("export: scan room_key: %w", err)
}
snap.Keys = append(snap.Keys, rec)
}
krows.Close()
if err := krows.Err(); err != nil {
return nil, err
}
users, err := s.ListUsers()
if err != nil {
return nil, fmt.Errorf("export: list users: %w", err)
}
snap.Users = users
return snap, nil
}
+195
View File
@@ -0,0 +1,195 @@
package membership
import (
"path/filepath"
"reflect"
"sort"
"testing"
"time"
"github.com/enmanuel/unibus/pkg/embeddednats"
"github.com/nats-io/nats.go"
"github.com/nats-io/nats.go/jetstream"
)
// seedSQLite populates a SQLite store with a representative control plane: two
// rooms (one rekeyed to epoch 2 with a removed member's keys left behind), a few
// members and sealed keys, and a user allowlist with one revoked entry. It
// returns the populated *sqliteStore and its file path.
func seedSQLite(t *testing.T) (*sqliteStore, string) {
t.Helper()
path := filepath.Join(t.TempDir(), "seed.db")
s, err := openSQLite(path)
if err != nil {
t.Fatalf("openSQLite: %v", err)
}
r1 := RoomInfo{RoomID: newULID(), Subject: "room.alpha", Encrypt: true, Persist: true, SignMsgs: true, OwnerEndpoint: "ep-owner1"}
if err := s.CreateRoom(r1, []byte("o1-sign"), []byte("o1-kex"), []byte("o1-sealed-e1")); err != nil {
t.Fatalf("create r1: %v", err)
}
if err := s.AddMember(r1.RoomID, Member{Endpoint: "ep-bob", Role: "member", SignPub: []byte("bob-sign"), KexPub: []byte("bob-kex")}, 1, []byte("bob-sealed-e1")); err != nil {
t.Fatalf("add bob: %v", err)
}
// Rekey r1 to epoch 2 (owner keeps a key at the new epoch).
if err := s.BumpEpoch(r1.RoomID, 2); err != nil {
t.Fatalf("bump: %v", err)
}
if err := s.PutSealedKeys(r1.RoomID, 2, map[string][]byte{"ep-owner1": []byte("o1-sealed-e2")}); err != nil {
t.Fatalf("put keys e2: %v", err)
}
r2 := RoomInfo{RoomID: newULID(), Subject: "room.beta", Encrypt: false, Persist: false, SignMsgs: false, OwnerEndpoint: "ep-owner2"}
if err := s.CreateRoom(r2, []byte("o2-sign"), []byte("o2-kex"), nil); err != nil {
t.Fatalf("create r2: %v", err)
}
if err := s.AddUser("aa11", "alice", RoleAdmin); err != nil {
t.Fatalf("add alice: %v", err)
}
if err := s.AddUser("bb22", "bob", RoleMember); err != nil {
t.Fatalf("add bob user: %v", err)
}
if err := s.AddUser("cc33", "carol", RoleMember); err != nil {
t.Fatalf("add carol: %v", err)
}
if err := s.RevokeUser("cc33"); err != nil {
t.Fatalf("revoke carol: %v", err)
}
return s, path
}
// normalizeSnapshot sorts every slice in a Snapshot so two snapshots from
// different backends can be compared regardless of enumeration order.
func normalizeSnapshot(snap *Snapshot) {
sort.Slice(snap.Rooms, func(i, j int) bool { return snap.Rooms[i].RoomID < snap.Rooms[j].RoomID })
for _, ms := range snap.Members {
sort.Slice(ms, func(i, j int) bool { return ms[i].Endpoint < ms[j].Endpoint })
}
sort.Slice(snap.Keys, func(i, j int) bool {
a, b := snap.Keys[i], snap.Keys[j]
if a.RoomID != b.RoomID {
return a.RoomID < b.RoomID
}
if a.Endpoint != b.Endpoint {
return a.Endpoint < b.Endpoint
}
return a.Epoch < b.Epoch
})
sort.Slice(snap.Users, func(i, j int) bool { return snap.Users[i].SignPub < snap.Users[j].SignPub })
}
func newJS(t *testing.T) jetstream.JetStream {
t.Helper()
ns, err := embeddednats.StartServer(embeddednats.ServerConfig{
StoreDir: t.TempDir(),
Host: "127.0.0.1",
Port: kvFreePort(t),
})
if err != nil {
t.Fatalf("embedded nats: %v", err)
}
nc, err := nats.Connect(ns.ClientURL())
if err != nil {
ns.Shutdown()
t.Fatalf("nats connect: %v", err)
}
js, err := jetstream.New(nc)
if err != nil {
nc.Close()
ns.Shutdown()
t.Fatalf("jetstream: %v", err)
}
t.Cleanup(func() { nc.Close(); ns.Shutdown(); ns.WaitForShutdown() })
return js
}
// TestMigrateSQLiteToKVParity is the parity test the issue mandates: after the
// migration, the KV store holds exactly the SQLite source's state.
func TestMigrateSQLiteToKVParity(t *testing.T) {
src, path := seedSQLite(t)
srcSnap, err := src.ExportSnapshot()
if err != nil {
t.Fatalf("export sqlite: %v", err)
}
src.Close() // release the file before the migration reopens it
js := newJS(t)
report, err := MigrateSQLiteToKV(path, js, JetStreamConfig{Replicas: 1, OpTimeout: 5 * time.Second})
if err != nil {
t.Fatalf("migrate: %v", err)
}
if report.Rooms != 2 || report.Users != 3 {
t.Fatalf("report mismatch: %+v", report)
}
kv, err := OpenJetStream(js, JetStreamConfig{Replicas: 1, OpTimeout: 5 * time.Second})
if err != nil {
t.Fatalf("open kv: %v", err)
}
kvSnap, err := kv.(*jetstreamStore).ExportSnapshot()
if err != nil {
t.Fatalf("export kv: %v", err)
}
normalizeSnapshot(srcSnap)
normalizeSnapshot(kvSnap)
if !reflect.DeepEqual(srcSnap, kvSnap) {
t.Fatalf("parity mismatch after migration:\n sqlite=%+v\n kv= %+v", srcSnap, kvSnap)
}
}
// TestMigrateSQLiteToKVIdempotent: running the migration twice converges to the
// same KV state (every write is an overwrite). A second run must not duplicate
// or corrupt anything.
func TestMigrateSQLiteToKVIdempotent(t *testing.T) {
src, path := seedSQLite(t)
srcSnap, _ := src.ExportSnapshot()
src.Close()
js := newJS(t)
if _, err := MigrateSQLiteToKV(path, js, JetStreamConfig{Replicas: 1}); err != nil {
t.Fatalf("migrate run 1: %v", err)
}
if _, err := MigrateSQLiteToKV(path, js, JetStreamConfig{Replicas: 1}); err != nil {
t.Fatalf("migrate run 2: %v", err)
}
kv, _ := OpenJetStream(js, JetStreamConfig{Replicas: 1})
kvSnap, err := kv.(*jetstreamStore).ExportSnapshot()
if err != nil {
t.Fatalf("export kv: %v", err)
}
normalizeSnapshot(srcSnap)
normalizeSnapshot(kvSnap)
if !reflect.DeepEqual(srcSnap, kvSnap) {
t.Fatalf("idempotency broken: a second migration changed the KV state\n sqlite=%+v\n kv= %+v", srcSnap, kvSnap)
}
}
// TestBackupSQLiteCreatesConsistentCopy verifies the pre-migration backup is a
// real, openable copy holding the same data.
func TestBackupSQLiteCreatesConsistentCopy(t *testing.T) {
src, path := seedSQLite(t)
srcSnap, _ := src.ExportSnapshot()
src.Close()
bak, err := BackupSQLite(path)
if err != nil {
t.Fatalf("backup: %v", err)
}
restored, err := openSQLite(bak)
if err != nil {
t.Fatalf("open backup: %v", err)
}
defer restored.Close()
bakSnap, err := restored.ExportSnapshot()
if err != nil {
t.Fatalf("export backup: %v", err)
}
normalizeSnapshot(srcSnap)
normalizeSnapshot(bakSnap)
if !reflect.DeepEqual(srcSnap, bakSnap) {
t.Fatalf("backup is not a faithful copy")
}
}
+2 -2
View File
@@ -11,7 +11,7 @@ import (
// (error), and after the TTL the same nonce is accepted again because its entry // (error), and after the TTL the same nonce is accepted again because its entry
// was pruned (edge). // was pruned (edge).
func TestNonceCacheRememberPrune(t *testing.T) { func TestNonceCacheRememberPrune(t *testing.T) {
nc := newNonceCache(50*time.Millisecond, 1000) nc := newMemNonceCache(50*time.Millisecond, 1000)
base := time.Now() base := time.Now()
if !nc.rememberOrReject("a", base) { if !nc.rememberOrReject("a", base) {
@@ -31,7 +31,7 @@ func TestNonceCacheRememberPrune(t *testing.T) {
// from the map. // from the map.
func TestNonceCacheCapBounded(t *testing.T) { func TestNonceCacheCapBounded(t *testing.T) {
const capacity = 100 const capacity = 100
nc := newNonceCache(time.Hour, capacity) nc := newMemNonceCache(time.Hour, capacity)
base := time.Now() base := time.Now()
for i := 0; i < 500; i++ { for i := 0; i < 500; i++ {
nc.rememberOrReject("n"+strconv.Itoa(i), base) nc.rememberOrReject("n"+strconv.Itoa(i), base)
+77
View File
@@ -0,0 +1,77 @@
package membership
// kvNonceStore is the replicated anti-replay backend (issue 0003e): seen nonces
// live in a JetStream KV bucket shared by every node, with a per-key TTL so they
// expire on their own. This closes the multi-node replay hole the auditor
// flagged: the per-process memNonceCache let an attacker replay a captured
// request to a DIFFERENT node, whose local cache never saw the nonce. With the
// shared bucket the first node to see a nonce wins the atomic Create, and every
// other node rejects the replay.
import (
"context"
"crypto/sha256"
"encoding/hex"
"errors"
"fmt"
"time"
"github.com/nats-io/nats.go/jetstream"
)
const bucketNonces = "UNIBUS_nonces"
type kvNonceStore struct {
kv jetstream.KeyValue
opTimeout time.Duration
}
// newKVNonceStore creates (or opens) the replicated nonce bucket. ttl is the
// per-key expiry — it must be >= the request acceptance window (2*clockSkew) so
// a replay can never outlive its memory, exactly like the in-memory cache's TTL.
func newKVNonceStore(js jetstream.JetStream, ttl time.Duration, replicas int, opTimeout time.Duration) (*kvNonceStore, error) {
if replicas <= 0 {
replicas = 1
}
if opTimeout <= 0 {
opTimeout = defaultKVOpTime
}
ctx, cancel := context.WithTimeout(context.Background(), 15*time.Second)
defer cancel()
kv, err := js.CreateOrUpdateKeyValue(ctx, jetstream.KeyValueConfig{
Bucket: bucketNonces,
TTL: ttl,
Replicas: replicas,
History: 1,
Storage: jetstream.FileStorage,
})
if err != nil {
return nil, fmt.Errorf("membership: open nonce KV bucket (replicas=%d): %w", replicas, err)
}
return &kvNonceStore{kv: kv, opTimeout: opTimeout}, nil
}
// nonceKVKey maps a raw nonce (std-base64, which contains '+' '/' '=' that KV
// keys forbid) to a KV-safe token: the hex of its sha256. Deterministic, so the
// same nonce always maps to the same key, and collision-free in practice.
func nonceKVKey(nonce string) string {
sum := sha256.Sum256([]byte(nonce))
return hex.EncodeToString(sum[:])
}
// rememberOrReject atomically claims the nonce: Create succeeds only if the key
// is absent, so the first sight returns true (accept) and any later sight (a
// replay, on this or any other node sharing the bucket) returns false. A backend
// error fails CLOSED — reject — so a KV outage never silently disables
// anti-replay. The TTL on the bucket expires the key, reopening the window.
func (s *kvNonceStore) rememberOrReject(nonce string, _ time.Time) bool {
ctx, cancel := context.WithTimeout(context.Background(), s.opTimeout)
defer cancel()
if _, err := s.kv.Create(ctx, nonceKVKey(nonce), nil); err != nil {
if errors.Is(err, jetstream.ErrKeyExists) {
return false // replay: already claimed
}
return false // backend unreachable: fail closed
}
return true // first sight: accept
}
+117
View File
@@ -0,0 +1,117 @@
package membership
import (
"crypto/rand"
"encoding/base64"
"encoding/hex"
"net/http"
"net/http/httptest"
"path/filepath"
"testing"
"time"
cs "fn-registry/functions/cybersecurity"
"github.com/enmanuel/unibus/pkg/blobstore"
"github.com/enmanuel/unibus/pkg/embeddednats"
"github.com/enmanuel/unibus/pkg/frame"
"github.com/nats-io/nats.go"
"github.com/nats-io/nats.go/jetstream"
)
// TestReplicatedNonceRejectsCrossNodeReplay is the issue's mandated error path:
// with the shared KV nonce store, a request accepted on node A is rejected as a
// replay when the SAME signed bytes are sent to node B. This closes the
// multi-node replay hole that the per-process cache left open.
func TestReplicatedNonceRejectsCrossNodeReplay(t *testing.T) {
// One NATS+JetStream backing the shared nonce bucket.
ns, err := embeddednats.StartServer(embeddednats.ServerConfig{
StoreDir: t.TempDir(), Host: "127.0.0.1", Port: kvFreePort(t),
})
if err != nil {
t.Fatalf("nats: %v", err)
}
t.Cleanup(func() { ns.Shutdown(); ns.WaitForShutdown() })
nc, err := nats.Connect(ns.ClientURL())
if err != nil {
t.Fatalf("connect: %v", err)
}
t.Cleanup(nc.Close)
js, err := jetstream.New(nc)
if err != nil {
t.Fatalf("jetstream: %v", err)
}
// One shared SQLite store (simulating the replicated control-plane state) and
// two membershipd servers (two nodes) that BOTH use the shared KV nonce store.
dir := t.TempDir()
store, err := Open(filepath.Join(dir, "unibus.db"))
if err != nil {
t.Fatalf("store: %v", err)
}
t.Cleanup(func() { store.Close() })
alice, err := cs.GenerateIdentity()
if err != nil {
t.Fatalf("identity: %v", err)
}
alicePub := hex.EncodeToString(alice.SignPub)
if err := store.AddUser(alicePub, "alice", RoleAdmin); err != nil {
t.Fatalf("add alice: %v", err)
}
blobs, _ := blobstore.New(filepath.Join(dir, "blobs"))
mkNode := func() *httptest.Server {
srv := NewServer(store, blobs, AuthEnforce)
if err := srv.UseReplicatedNonces(js, 1); err != nil {
t.Fatalf("UseReplicatedNonces: %v", err)
}
return httptest.NewServer(srv)
}
nodeA := mkNode()
t.Cleanup(nodeA.Close)
nodeB := mkNode()
t.Cleanup(nodeB.Close)
// Build ONE signed request (fixed ts+nonce) and send the identical bytes to
// both nodes. Authenticated path: alice listing her own rooms (200, empty).
ts := time.Now().Unix()
nonceRaw := make([]byte, 16)
if _, err := rand.Read(nonceRaw); err != nil {
t.Fatalf("nonce: %v", err)
}
nonce := base64.StdEncoding.EncodeToString(nonceRaw)
path := "/members/" + frame.EndpointID(alice.SignPub) + "/rooms"
reqA := signedReq(t, nodeA.URL, "GET", path, nil, alice, ts, nonce)
respA, err := http.DefaultClient.Do(reqA)
if err != nil {
t.Fatalf("do A: %v", err)
}
respA.Body.Close()
if respA.StatusCode != http.StatusOK {
t.Fatalf("node A first use: status %d, want 200 (auth should pass, nonce fresh)", respA.StatusCode)
}
// Replay the SAME ts+nonce to node B: the shared bucket already holds the
// nonce, so node B must reject it.
reqB := signedReq(t, nodeB.URL, "GET", path, nil, alice, ts, nonce)
respB, err := http.DefaultClient.Do(reqB)
if err != nil {
t.Fatalf("do B: %v", err)
}
respB.Body.Close()
if respB.StatusCode != http.StatusUnauthorized {
t.Fatalf("cross-node replay to node B: status %d, want 401 (replayed nonce)", respB.StatusCode)
}
// And replaying to node A again is likewise rejected (same bucket).
reqA2 := signedReq(t, nodeA.URL, "GET", path, nil, alice, ts, nonce)
respA2, err := http.DefaultClient.Do(reqA2)
if err != nil {
t.Fatalf("do A2: %v", err)
}
respA2.Body.Close()
if respA2.StatusCode != http.StatusUnauthorized {
t.Fatalf("replay to node A: status %d, want 401", respA2.StatusCode)
}
}
+57
View File
@@ -0,0 +1,57 @@
package membership_test
import (
"encoding/json"
"io"
"net/http"
"net/http/httptest"
"path/filepath"
"testing"
"github.com/enmanuel/unibus/pkg/blobstore"
"github.com/enmanuel/unibus/pkg/membership"
)
// TestHealthExposesPosture: /healthz publishes the node's security posture so a
// monitor (or a peer) can detect a cluster member that is not enforce+ACL+TLS
// (audit 0008 N1). The probe stays unauthenticated.
func TestHealthExposesPosture(t *testing.T) {
dir := t.TempDir()
store, err := membership.Open(filepath.Join(dir, "unibus.db"))
if err != nil {
t.Fatalf("store: %v", err)
}
t.Cleanup(func() { store.Close() })
blobs, _ := blobstore.New(filepath.Join(dir, "blobs"))
srv := membership.NewServer(store, blobs, membership.AuthEnforce)
srv.Posture = membership.Posture{Enforce: true, ACL: true, TLS: true, Cluster: true, Store: "kv"}
ts := httptest.NewServer(srv)
t.Cleanup(ts.Close)
resp, err := http.Get(ts.URL + "/healthz")
if err != nil {
t.Fatalf("get healthz: %v", err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
t.Fatalf("healthz status %d, want 200", resp.StatusCode)
}
body, _ := io.ReadAll(resp.Body)
var got struct {
Status string `json:"status"`
Posture membership.Posture `json:"posture"`
}
if err := json.Unmarshal(body, &got); err != nil {
t.Fatalf("decode healthz %q: %v", string(body), err)
}
if got.Status != "ok" {
t.Fatalf("status = %q, want ok", got.Status)
}
if !got.Posture.Enforce || !got.Posture.ACL || !got.Posture.TLS || !got.Posture.Cluster {
t.Fatalf("posture not surfaced correctly: %+v", got.Posture)
}
if got.Posture.Store != "kv" {
t.Fatalf("posture.store = %q, want kv", got.Posture.Store)
}
}
+88
View File
@@ -0,0 +1,88 @@
package membership_test
import (
"path/filepath"
"testing"
"time"
"github.com/enmanuel/unibus/pkg/blobstore"
"github.com/enmanuel/unibus/pkg/client"
"github.com/enmanuel/unibus/pkg/frame"
"github.com/enmanuel/unibus/pkg/membership"
"github.com/enmanuel/unibus/pkg/room"
)
// TestClientCreateRoomRefreshPublishFlow is the issue 0006e DoD: under enforce+ACL
// a peer creates a room AFTER connecting, and pub/sub works without manual
// intervention because the client follows the membership-change contract
// (CreateRoom -> RefreshSession -> Subscribe/Publish), exactly as cmd/chat and
// cmd/worker now do. This is the end-to-end flow through the client API, proving
// the ACL is usable under enforce rather than something an operator must disable.
func TestClientCreateRoomRefreshPublishFlow(t *testing.T) {
dir := t.TempDir()
store, err := membership.Open(filepath.Join(dir, "unibus.db"))
if err != nil {
t.Fatalf("store: %v", err)
}
t.Cleanup(func() { store.Close() })
alice, bob := mustID(t), mustID(t)
mustAddUser(t, store, alice, "alice")
mustAddUser(t, store, bob, "bob")
srv := startACLNats(t, store) // data plane: enforce + per-subject ACL
blobs, _ := blobstore.New(filepath.Join(dir, "blobs"))
ctrl := newCtrl(t, store, blobs)
aliceC, err := client.NewWithOptions(srv.ClientURL(), ctrl, alice, client.Options{UseNkey: true})
if err != nil {
t.Fatalf("connect alice: %v", err)
}
defer aliceC.Close()
bobC, err := client.NewWithOptions(srv.ClientURL(), ctrl, bob, client.Options{UseNkey: true})
if err != nil {
t.Fatalf("connect bob: %v", err)
}
defer bobC.Close()
// alice creates a room AFTER connecting: the subject was not in her ACL at
// connect time, so she must refresh to publish on it (the worker contract).
roomID, err := aliceC.CreateRoom("room.flow.x", room.ModeNATS)
if err != nil {
t.Fatalf("alice create room: %v", err)
}
if err := aliceC.RefreshSession(); err != nil {
t.Fatalf("alice refresh: %v", err)
}
// alice invites bob; bob joins then refreshes to gain the subject (the chat
// subscriber contract), and only then subscribes.
if err := aliceC.Invite(roomID, bobC.Endpoint()); err != nil {
t.Fatalf("alice invite bob: %v", err)
}
if err := bobC.Join(roomID); err != nil {
t.Fatalf("bob join: %v", err)
}
if err := bobC.RefreshSession(); err != nil {
t.Fatalf("bob refresh: %v", err)
}
got := make(chan string, 4)
sub, err := bobC.Subscribe(roomID, func(_ frame.Frame, plaintext []byte) { got <- string(plaintext) })
if err != nil {
t.Fatalf("bob subscribe after refresh: %v", err)
}
defer sub.Unsubscribe()
time.Sleep(200 * time.Millisecond) // let the subscription settle
if err := aliceC.Publish(roomID, []byte("hello-under-acl")); err != nil {
t.Fatalf("alice publish after refresh: %v", err)
}
select {
case msg := <-got:
if msg != "hello-under-acl" {
t.Fatalf("bob got %q", msg)
}
case <-time.After(3 * time.Second):
t.Fatalf("bob did not receive the message: the create->refresh->subscribe flow is broken under enforce+ACL")
}
}
+242 -15
View File
@@ -3,7 +3,6 @@ package membership
import ( import (
"bytes" "bytes"
"context" "context"
"database/sql"
"encoding/json" "encoding/json"
"errors" "errors"
"fmt" "fmt"
@@ -20,6 +19,7 @@ import (
"github.com/enmanuel/unibus/pkg/blobstore" "github.com/enmanuel/unibus/pkg/blobstore"
"github.com/enmanuel/unibus/pkg/frame" "github.com/enmanuel/unibus/pkg/frame"
"github.com/nats-io/nats.go/jetstream"
) )
// Body-size ceilings for the control plane. They bound how much an unauthenticated // Body-size ceilings for the control plane. They bound how much an unauthenticated
@@ -35,6 +35,14 @@ const (
// MaxHeaderBytes caps request header size; wired into the http.Server by the // MaxHeaderBytes caps request header size; wired into the http.Server by the
// command. Exported so the bound lives next to its body-size siblings. // command. Exported so the bound lives next to its body-size siblings.
MaxHeaderBytes = 1 << 20 // 1 MiB MaxHeaderBytes = 1 << 20 // 1 MiB
// maxInflightBytes is the GLOBAL cap on request-body bytes buffered across all
// concurrent requests (audit N2). The per-request ceilings above bound one
// request; this bounds the sum, so a concurrent (even multi-IP) flood of
// max-size uploads cannot drive the resident set without limit. 128 MiB allows
// ~8 concurrent 16 MiB blob uploads or ~128 concurrent control requests before
// further POSTs are shed with 503 — generous for an interactive bus, bounded
// for an attacker.
maxInflightBytes = 128 << 20 // 128 MiB
) )
// Per-IP rate-limit defaults for the control plane. Tuned for an interactive // Per-IP rate-limit defaults for the control plane. Tuned for an interactive
@@ -56,12 +64,13 @@ const (
// rate limiting, and read endpoints (GET) are unauthenticated. Hardening // rate limiting, and read endpoints (GET) are unauthenticated. Hardening
// (mTLS, capabilities, rate limits) is a later phase. // (mTLS, capabilities, rate limits) is a later phase.
type Server struct { type Server struct {
store *Store store Store
blobs *blobstore.Store blobs blobstore.Store
mux *http.ServeMux mux *http.ServeMux
authMode AuthMode authMode AuthMode
nonces *nonceCache nonces nonceStore
limiter *ipRateLimiter limiter *ipRateLimiter
inflight *inflightLimiter
// RequireEncryptedRooms, when true, refuses to create cleartext (ModeNATS) // RequireEncryptedRooms, when true, refuses to create cleartext (ModeNATS)
// rooms. It is the minimum-defensive control for the data plane (audit H4): // rooms. It is the minimum-defensive control for the data plane (audit H4):
@@ -72,6 +81,25 @@ type Server struct {
// (non-loopback) bind. See dev/0004d-dataplane-acl.md for the full rationale // (non-loopback) bind. See dev/0004d-dataplane-acl.md for the full rationale
// and the residual metadata exposure this does NOT close. // and the residual metadata exposure this does NOT close.
RequireEncryptedRooms bool RequireEncryptedRooms bool
// Posture is the node's security posture, surfaced on /healthz so an operator
// or a peer can detect a node NOT running the homogeneous enforce+ACL+TLS
// posture a secure cluster requires (audit 0008 N1). It is set by the command;
// the zero value (all false) reflects an unsecured dev node.
Posture Posture
}
// Posture describes the security posture a membershipd node runs with. It is
// non-secret operational metadata (booleans + the store backend name), published
// on /healthz so a monitor can flag a cluster member that is not enforce+ACL+TLS
// — the weak node that would let an unauthenticated peer harvest the cluster's
// forwarded traffic (audit 0008 N1).
type Posture struct {
Enforce bool `json:"enforce"`
ACL bool `json:"acl"`
TLS bool `json:"tls"`
Cluster bool `json:"cluster"`
Store string `json:"store"` // "sqlite" | "kv"
} }
// NewServer wires the membership store and blob store into an http.Handler. The // NewServer wires the membership store and blob store into an http.Handler. The
@@ -79,19 +107,36 @@ type Server struct {
// tests that have not migrated to signed requests yet). It installs a per-IP // tests that have not migrated to signed requests yet). It installs a per-IP
// rate limiter with the package defaults; loopback dev behavior is unchanged // rate limiter with the package defaults; loopback dev behavior is unchanged
// because the burst comfortably exceeds any single client's request rate. // because the burst comfortably exceeds any single client's request rate.
func NewServer(store *Store, blobs *blobstore.Store, authMode AuthMode) *Server { func NewServer(store Store, blobs blobstore.Store, authMode AuthMode) *Server {
s := &Server{ s := &Server{
store: store, store: store,
blobs: blobs, blobs: blobs,
mux: http.NewServeMux(), mux: http.NewServeMux(),
authMode: authMode, authMode: authMode,
nonces: newNonceCache(nonceTTL, maxNonceCacheEntries), nonces: newMemNonceCache(nonceTTL, maxNonceCacheEntries),
limiter: newIPRateLimiter(defaultRatePerSec, defaultRateBurst, rateBucketTTL), limiter: newIPRateLimiter(defaultRatePerSec, defaultRateBurst, rateBucketTTL),
inflight: newInflightLimiter(maxInflightBytes),
} }
s.routes() s.routes()
return s return s
} }
// UseReplicatedNonces switches the server's anti-replay store from the
// per-process in-memory cache to a JetStream KV bucket shared across the cluster
// (issue 0003e). It MUST be called on every node of a multi-node deployment:
// otherwise a request captured on one node can be replayed to another whose
// local cache never saw the nonce. replicas is the bucket's replication factor
// (R1..R3). The TTL matches the in-memory cache (nonceTTL = 2*clockSkew), so a
// replay can never outlive its memory.
func (s *Server) UseReplicatedNonces(js jetstream.JetStream, replicas int) error {
ns, err := newKVNonceStore(js, nonceTTL, replicas, 0)
if err != nil {
return err
}
s.nonces = ns
return nil
}
// ServeHTTP satisfies http.Handler. It runs the control-plane auth middleware // ServeHTTP satisfies http.Handler. It runs the control-plane auth middleware
// (signature verification + anti-replay + allowlist) ahead of the router // (signature verification + anti-replay + allowlist) ahead of the router
// according to authMode, then dispatches to the matched handler. // according to authMode, then dispatches to the matched handler.
@@ -123,6 +168,22 @@ func (s *Server) ServeHTTP(w http.ResponseWriter, r *http.Request) {
} }
r.Body = http.MaxBytesReader(w, r.Body, limit) r.Body = http.MaxBytesReader(w, r.Body, limit)
// Aggregate memory bound (audit N2): the per-request ceiling above and the
// per-IP rate limit do not cap the TOTAL bytes buffered across concurrent
// requests. A POST reserves its worst-case buffered size (its route ceiling)
// from a global limiter before the body is read, and is shed with 503 when the
// cap is reached, so the resident set stays bounded under a concurrent (even
// multi-IP) upload flood instead of growing linearly with the number of
// connections. Reservation is released when the request finishes. Only POSTs
// buffer a body; GETs carry none, so they do not consume the budget.
if r.Method == http.MethodPost {
if !s.inflight.tryAcquire(limit) {
writeErr(w, http.StatusServiceUnavailable, "server busy: too many concurrent uploads in flight")
return
}
defer s.inflight.release(limit)
}
if s.authMode == AuthOff || isAuthExempt(r) { if s.authMode == AuthOff || isAuthExempt(r) {
s.mux.ServeHTTP(w, r) s.mux.ServeHTTP(w, r)
return return
@@ -152,9 +213,12 @@ func (s *Server) ServeHTTP(w http.ResponseWriter, r *http.Request) {
writeErr(w, http.StatusUnauthorized, "unauthorized: "+err.Error()) writeErr(w, http.StatusUnauthorized, "unauthorized: "+err.Error())
return return
} }
// Carry the authenticated signer's endpoint into the handler so room handlers // Carry the authenticated signer's endpoint AND signing key into the handler.
// can authorize by membership (audit H3). Only set on a verified identity. // Room handlers authorize by membership via the endpoint (audit H3); the
s.mux.ServeHTTP(w, r.WithContext(withSigner(r.Context(), res.endpoint))) // user-management handlers authorize by role via the signing key (the endpoint
// id is a one-way hash of the key, so it cannot be reversed to look the signer
// up in the user allowlist). Both are set only on a verified identity.
s.mux.ServeHTTP(w, r.WithContext(withSigner(r.Context(), res.endpoint, res.pubHex)))
} }
// isBodyTooLarge reports whether err is the sentinel returned by MaxBytesReader // isBodyTooLarge reports whether err is the sentinel returned by MaxBytesReader
@@ -168,11 +232,19 @@ func isBodyTooLarge(err error) bool {
// values cannot collide with keys set by other packages. // values cannot collide with keys set by other packages.
type ctxKey int type ctxKey int
const ctxSignerEndpoint ctxKey = iota const (
ctxSignerEndpoint ctxKey = iota
ctxSignerPub
)
// withSigner returns a context carrying the authenticated signer's endpoint id. // withSigner returns a context carrying the authenticated signer's endpoint id
func withSigner(ctx context.Context, endpoint string) context.Context { // and signing public key (lowercase hex). The endpoint authorizes room
return context.WithValue(ctx, ctxSignerEndpoint, endpoint) // membership; the signing key authorizes user-management by role, because the
// endpoint id is a one-way hash of the key (base64url(sha256(signPub))) and so
// cannot be reversed to look the signer up in the user allowlist.
func withSigner(ctx context.Context, endpoint, pubHex string) context.Context {
ctx = context.WithValue(ctx, ctxSignerEndpoint, endpoint)
return context.WithValue(ctx, ctxSignerPub, pubHex)
} }
// signerEndpoint returns the authenticated signer's endpoint id and whether one // signerEndpoint returns the authenticated signer's endpoint id and whether one
@@ -184,6 +256,16 @@ func signerEndpoint(r *http.Request) (string, bool) {
return v, ok && v != "" return v, ok && v != ""
} }
// signerPubHex returns the authenticated signer's signing public key (lowercase
// hex) and whether one is present. Like signerEndpoint it is absent under
// AuthOff and on a soft-mode pass-through; the user-management handlers treat
// that absence as "no admin identity" and deny (default-deny), since a
// privilege-granting operation must never run without a verified admin.
func signerPubHex(r *http.Request) (string, bool) {
v, ok := r.Context().Value(ctxSignerPub).(string)
return v, ok && v != ""
}
// requireMember authorizes a room request by membership (audit H3): it returns // requireMember authorizes a room request by membership (audit H3): it returns
// the signer endpoint and true when the request may proceed, or writes 403 and // the signer endpoint and true when the request may proceed, or writes 403 and
// returns false when an authenticated signer is not a member of roomID. When no // returns false when an authenticated signer is not a member of roomID. When no
@@ -201,6 +283,31 @@ func (s *Server) requireMember(w http.ResponseWriter, r *http.Request, roomID st
return signer, true return signer, true
} }
// requireAdmin authorizes a user-management request: it returns the signer's
// signing-key hex and true ONLY when the authenticated signer is a user with
// role admin and active status; otherwise it writes 403 and returns false.
//
// Default-deny, with no dev relaxation: unlike requireMember (which allows a
// request when no authenticated signer is present, preserving AuthOff/dev
// behavior for room reads), this denies whenever the signer is absent or is not
// a verified active admin. The user-management endpoints grant and revoke bus
// access, so they must never be reachable without a verified admin identity —
// the store is consulted on every call so a just-revoked admin is denied
// immediately, and any store error fails closed.
func (s *Server) requireAdmin(w http.ResponseWriter, r *http.Request) (string, bool) {
pubHex, ok := signerPubHex(r)
if !ok {
writeErr(w, http.StatusForbidden, "forbidden: admin role required")
return "", false
}
u, err := s.store.GetUser(pubHex)
if err != nil || u.Role != RoleAdmin || u.Status != StatusActive {
writeErr(w, http.StatusForbidden, "forbidden: admin role required")
return "", false
}
return pubHex, true
}
// isAuthExempt lists requests that bypass control-plane auth even under enforce. // isAuthExempt lists requests that bypass control-plane auth even under enforce.
// Only the unauthenticated health probe qualifies: it carries no data and is // Only the unauthenticated health probe qualifies: it carries no data and is
// needed by load balancers / smoke checks / systemd before any identity exists. // needed by load balancers / smoke checks / systemd before any identity exists.
@@ -219,6 +326,13 @@ func (s *Server) routes() {
s.mux.HandleFunc("GET /rooms/{id}", s.handleGetRoom) s.mux.HandleFunc("GET /rooms/{id}", s.handleGetRoom)
s.mux.HandleFunc("POST /blobs", s.handlePutBlob) s.mux.HandleFunc("POST /blobs", s.handlePutBlob)
s.mux.HandleFunc("GET /blobs/{hash}", s.handleGetBlob) s.mux.HandleFunc("GET /blobs/{hash}", s.handleGetBlob)
// User-management (admin-only) — the HTTP-signed equivalent of the local
// `membershipd user` CLI, so the admin panel manages the bus allowlist by
// signing as an admin instead of needing direct store/KV access. All three
// pass through requireAdmin; they hit the same store the room handlers do.
s.mux.HandleFunc("GET /users", s.handleListUsers)
s.mux.HandleFunc("POST /users", s.handleAddUser)
s.mux.HandleFunc("POST /users/{signpub}/revoke", s.handleRevokeUser)
} }
// ---- wire types ----------------------------------------------------------- // ---- wire types -----------------------------------------------------------
@@ -296,6 +410,27 @@ type blobResp struct {
Hash string `json:"hash"` Hash string `json:"hash"`
} }
// userJSON is the wire representation of a bus user on the admin endpoints. It
// carries the full record the panel needs to render the allowlist, including
// status (so revoked users are visible) and the timestamps. revoked_at is
// omitted for an active user.
type userJSON struct {
SignPub string `json:"sign_pub"`
Handle string `json:"handle"`
Role string `json:"role"`
Status string `json:"status"`
CreatedAt string `json:"created_at"`
RevokedAt string `json:"revoked_at,omitempty"`
}
// addUserReq is the POST /users body: the new user's Ed25519 signing key
// (64-hex), human handle, and role. role is optional and defaults to member.
type addUserReq struct {
SignPub string `json:"sign_pub"`
Handle string `json:"handle"`
Role string `json:"role"`
}
// ---- helpers -------------------------------------------------------------- // ---- helpers --------------------------------------------------------------
func writeJSON(w http.ResponseWriter, code int, v any) { func writeJSON(w http.ResponseWriter, code int, v any) {
@@ -348,7 +483,7 @@ func (s *Server) verifyOwnerSig(roomID, by string, sig, canonical []byte) (Membe
// ---- handlers ------------------------------------------------------------- // ---- handlers -------------------------------------------------------------
func (s *Server) handleHealth(w http.ResponseWriter, _ *http.Request) { func (s *Server) handleHealth(w http.ResponseWriter, _ *http.Request) {
writeJSON(w, http.StatusOK, map[string]string{"status": "ok"}) writeJSON(w, http.StatusOK, map[string]any{"status": "ok", "posture": s.Posture})
} }
func (s *Server) handleCreateRoom(w http.ResponseWriter, r *http.Request) { func (s *Server) handleCreateRoom(w http.ResponseWriter, r *http.Request) {
@@ -456,7 +591,7 @@ func (s *Server) handleGetKey(w http.ResponseWriter, r *http.Request) {
} }
ep, sealed, err := s.store.GetSealedKey(roomID, endpoint, epoch) ep, sealed, err := s.store.GetSealedKey(roomID, endpoint, epoch)
if err != nil { if err != nil {
if errors.Is(err, sql.ErrNoRows) { if errors.Is(err, ErrNotFound) {
writeErr(w, http.StatusForbidden, writeErr(w, http.StatusForbidden,
"not invited to this encrypted room: no key has been sealed for your identity. Ask the room owner to invite you before joining.") "not invited to this encrypted room: no key has been sealed for your identity. Ask the room owner to invite you before joining.")
return return
@@ -613,3 +748,95 @@ func (s *Server) handleGetBlob(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK) w.WriteHeader(http.StatusOK)
_, _ = w.Write(data) _, _ = w.Write(data)
} }
// ---- user-management handlers (admin-only) --------------------------------
// handleListUsers returns the full bus allowlist, including revoked users, so an
// admin sees the complete picture (a revoked identity stays auditable). Admin-only.
func (s *Server) handleListUsers(w http.ResponseWriter, r *http.Request) {
if _, ok := s.requireAdmin(w, r); !ok {
return
}
users, err := s.store.ListUsers()
if err != nil {
writeServerErr(w, r, http.StatusInternalServerError, "internal error", err)
return
}
out := make([]userJSON, 0, len(users))
for _, u := range users {
out = append(out, userJSON{
SignPub: u.SignPub,
Handle: u.Handle,
Role: u.Role,
Status: u.Status,
CreatedAt: u.CreatedAt,
RevokedAt: u.RevokedAt,
})
}
writeJSON(w, http.StatusOK, out)
}
// handleAddUser registers a new bus user from an admin-supplied Ed25519 signing
// key. It mirrors the `membershipd user add` CLI: the key must be 64-hex, the
// role must be admin or member (empty defaults to member), and re-adding an
// already-registered key is a 409 that leaves the existing row untouched — no
// silent upsert that could flip a role or clobber status. Admin-only.
func (s *Server) handleAddUser(w http.ResponseWriter, r *http.Request) {
if _, ok := s.requireAdmin(w, r); !ok {
return
}
var req addUserReq
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
writeErr(w, http.StatusBadRequest, "bad json: "+err.Error())
return
}
if req.SignPub == "" || req.Handle == "" {
writeErr(w, http.StatusBadRequest, "sign_pub and handle required")
return
}
if err := ValidateSignPubHex(req.SignPub); err != nil {
writeErr(w, http.StatusBadRequest, err.Error())
return
}
role := req.Role
if role == "" {
role = RoleMember
}
if role != RoleAdmin && role != RoleMember {
writeErr(w, http.StatusBadRequest,
fmt.Sprintf("invalid role %q (want %q or %q)", role, RoleAdmin, RoleMember))
return
}
if err := s.store.AddUser(req.SignPub, req.Handle, role); err != nil {
if errors.Is(err, ErrUserExists) {
// Idempotency contract (mirrors the CLI): re-adding a key is an explicit,
// non-destructive conflict. To replace a user, revoke then add again.
writeErr(w, http.StatusConflict,
"user already registered (unchanged); revoke it first to replace")
return
}
writeServerErr(w, r, http.StatusInternalServerError, "internal error", err)
return
}
writeJSON(w, http.StatusCreated, map[string]string{"status": "added"})
}
// handleRevokeUser revokes a bus user by signing key. Revocation is a status
// flip (no hard delete) so the identity stays auditable and IsAuthorized denies
// it on both planes immediately. Revoking an unknown or already-revoked key is a
// 404. Admin-only.
func (s *Server) handleRevokeUser(w http.ResponseWriter, r *http.Request) {
if _, ok := s.requireAdmin(w, r); !ok {
return
}
signPub := r.PathValue("signpub")
if err := ValidateSignPubHex(signPub); err != nil {
writeErr(w, http.StatusBadRequest, err.Error())
return
}
if err := s.store.RevokeUser(signPub); err != nil {
writeServerErr(w, r, http.StatusNotFound, "no active user with that key", err)
return
}
writeJSON(w, http.StatusOK, map[string]string{"status": "revoked"})
}
+77 -18
View File
@@ -13,6 +13,7 @@ package membership
import ( import (
"database/sql" "database/sql"
"embed" "embed"
"errors"
"fmt" "fmt"
"io/fs" "io/fs"
"sort" "sort"
@@ -26,6 +27,14 @@ import (
//go:embed migrations/*.sql //go:embed migrations/*.sql
var migrationsFS embed.FS var migrationsFS embed.FS
// ErrNotFound is the store-agnostic "no such record" sentinel. Both backends
// (SQLite and JetStream KV) return it, wrapped, when a lookup misses, so callers
// distinguish "not invited / no key yet" from a genuine backend failure without
// depending on a specific driver's error (the SQLite store maps sql.ErrNoRows to
// it; the KV store maps a missing key to it). This is what lets the control
// plane stay storage-agnostic under the branch-by-abstraction of issue 0003b.
var ErrNotFound = errors.New("membership: not found")
// Member is a participant of a room with their published public keys. // Member is a participant of a room with their published public keys.
type Member struct { type Member struct {
Endpoint string `json:"endpoint"` Endpoint string `json:"endpoint"`
@@ -45,14 +54,58 @@ type RoomInfo struct {
OwnerEndpoint string OwnerEndpoint string
} }
// Store is the SQLite-backed membership/key store. // Store is the membership/key control-plane store: the authoritative source of
type Store struct { // room metadata, the member directory, per-epoch sealed room keys, and the bus
// user allowlist. It is an interface (branch-by-abstraction, issue 0003b) with
// two implementations: sqliteStore (the default, single-node, local SQLite) and
// jetstreamStore (rooms/members/keys/users on replicated JetStream KV, selected
// when the `decentralized` flag is on). Every lookup miss returns ErrNotFound
// (wrapped); every implementation MUST fail closed (IsAuthorized returns false
// on any backend error), so a KV quorum loss denies rather than admits.
type Store interface {
// Rooms / members / keys.
CreateRoom(info RoomInfo, ownerSignPub, ownerKexPub, ownerSealedKey []byte) error
GetRoom(roomID string) (RoomInfo, error)
AddMember(roomID string, m Member, epoch int, sealedKey []byte) error
GetMember(roomID, endpoint string) (Member, error)
ListMembers(roomID string) ([]Member, error)
ListRoomsForEndpoint(endpoint string) ([]RoomMembership, error)
GetSealedKey(roomID, endpoint string, epoch int) (int, []byte, error)
PutSealedKeys(roomID string, epoch int, keys map[string][]byte) error
BumpEpoch(roomID string, newEpoch int) error
RemoveMember(roomID, endpoint string) error
// Users (the bus allowlist).
AddUser(signPub, handle, role string) error
GetUser(signPub string) (User, error)
ListUsers() ([]User, error)
RevokeUser(signPub string) error
IsAuthorized(signPub string) bool
HasAdmin() bool
// Lifecycle.
Close() error
}
// sqliteStore is the SQLite-backed implementation of Store (the default,
// single-node backend). It stays the production default while the
// `decentralized` flag is off.
type sqliteStore struct {
db *sql.DB db *sql.DB
} }
// Open opens (creating if needed) the SQLite database at path and applies all // Open opens (creating if needed) the SQLite database at path, applies all
// embedded migrations idempotently. // embedded migrations idempotently, and returns it as a Store. It remains the
func Open(path string) (*Store, error) { // default control-plane backend; the JetStream KV store is opened separately
// (OpenJetStream) when decentralization is enabled.
func Open(path string) (Store, error) {
return openSQLite(path)
}
// openSQLite is the concrete constructor, returning *sqliteStore so internal
// callers (e.g. the SQLite->KV migration) can use SQLite-specific helpers that
// are not part of the storage-agnostic Store interface.
func openSQLite(path string) (*sqliteStore, error) {
// _pragma busy_timeout avoids spurious "database is locked" under concurrent // _pragma busy_timeout avoids spurious "database is locked" under concurrent
// HTTP handlers; foreign_keys kept off — we manage referential integrity in code. // HTTP handlers; foreign_keys kept off — we manage referential integrity in code.
dsn := fmt.Sprintf("file:%s?_pragma=busy_timeout(5000)&_pragma=journal_mode(WAL)", path) dsn := fmt.Sprintf("file:%s?_pragma=busy_timeout(5000)&_pragma=journal_mode(WAL)", path)
@@ -64,7 +117,7 @@ func Open(path string) (*Store, error) {
db.Close() db.Close()
return nil, fmt.Errorf("membership: ping db: %w", err) return nil, fmt.Errorf("membership: ping db: %w", err)
} }
s := &Store{db: db} s := &sqliteStore{db: db}
if err := s.applyMigrations(); err != nil { if err := s.applyMigrations(); err != nil {
db.Close() db.Close()
return nil, err return nil, err
@@ -73,11 +126,11 @@ func Open(path string) (*Store, error) {
} }
// Close closes the underlying database. // Close closes the underlying database.
func (s *Store) Close() error { return s.db.Close() } func (s *sqliteStore) Close() error { return s.db.Close() }
// applyMigrations runs every embedded migration in lexical order, tolerating // applyMigrations runs every embedded migration in lexical order, tolerating
// the "already applied" errors that SQLite's non-idempotent DDL produces. // the "already applied" errors that SQLite's non-idempotent DDL produces.
func (s *Store) applyMigrations() error { func (s *sqliteStore) applyMigrations() error {
files, err := fs.Glob(migrationsFS, "migrations/*.sql") files, err := fs.Glob(migrationsFS, "migrations/*.sql")
if err != nil { if err != nil {
return fmt.Errorf("membership: glob migrations: %w", err) return fmt.Errorf("membership: glob migrations: %w", err)
@@ -103,7 +156,7 @@ func nowRFC3339() string { return time.Now().UTC().Format(time.RFC3339Nano) }
// CreateRoom inserts a room at epoch 1, registers the owner as a member with // CreateRoom inserts a room at epoch 1, registers the owner as a member with
// role "owner", and stores the owner's sealed key for epoch 1. Idempotent // role "owner", and stores the owner's sealed key for epoch 1. Idempotent
// inserts are not used: a duplicate room_id returns an error. // inserts are not used: a duplicate room_id returns an error.
func (s *Store) CreateRoom(info RoomInfo, ownerSignPub, ownerKexPub, ownerSealedKey []byte) error { func (s *sqliteStore) CreateRoom(info RoomInfo, ownerSignPub, ownerKexPub, ownerSealedKey []byte) error {
tx, err := s.db.Begin() tx, err := s.db.Begin()
if err != nil { if err != nil {
return fmt.Errorf("membership: begin: %w", err) return fmt.Errorf("membership: begin: %w", err)
@@ -142,7 +195,7 @@ func (s *Store) CreateRoom(info RoomInfo, ownerSignPub, ownerKexPub, ownerSealed
} }
// GetRoom returns room metadata (including current epoch). // GetRoom returns room metadata (including current epoch).
func (s *Store) GetRoom(roomID string) (RoomInfo, error) { func (s *sqliteStore) GetRoom(roomID string) (RoomInfo, error) {
var info RoomInfo var info RoomInfo
var enc, per, sgn int var enc, per, sgn int
err := s.db.QueryRow( err := s.db.QueryRow(
@@ -158,7 +211,7 @@ func (s *Store) GetRoom(roomID string) (RoomInfo, error) {
// AddMember inserts a member at the given role and stores their sealed key for // AddMember inserts a member at the given role and stores their sealed key for
// the supplied epoch. // the supplied epoch.
func (s *Store) AddMember(roomID string, m Member, epoch int, sealedKey []byte) error { func (s *sqliteStore) AddMember(roomID string, m Member, epoch int, sealedKey []byte) error {
tx, err := s.db.Begin() tx, err := s.db.Begin()
if err != nil { if err != nil {
return fmt.Errorf("membership: begin: %w", err) return fmt.Errorf("membership: begin: %w", err)
@@ -185,7 +238,7 @@ func (s *Store) AddMember(roomID string, m Member, epoch int, sealedKey []byte)
} }
// GetMember returns a single member of a room. // GetMember returns a single member of a room.
func (s *Store) GetMember(roomID, endpoint string) (Member, error) { func (s *sqliteStore) GetMember(roomID, endpoint string) (Member, error) {
var m Member var m Member
err := s.db.QueryRow( err := s.db.QueryRow(
`SELECT endpoint, role, sign_pub, kex_pub FROM members WHERE room_id = ? AND endpoint = ?`, `SELECT endpoint, role, sign_pub, kex_pub FROM members WHERE room_id = ? AND endpoint = ?`,
@@ -198,7 +251,7 @@ func (s *Store) GetMember(roomID, endpoint string) (Member, error) {
} }
// ListMembers returns all members of a room ordered by endpoint. // ListMembers returns all members of a room ordered by endpoint.
func (s *Store) ListMembers(roomID string) ([]Member, error) { func (s *sqliteStore) ListMembers(roomID string) ([]Member, error) {
rows, err := s.db.Query( rows, err := s.db.Query(
`SELECT endpoint, role, sign_pub, kex_pub FROM members WHERE room_id = ? ORDER BY endpoint`, `SELECT endpoint, role, sign_pub, kex_pub FROM members WHERE room_id = ? ORDER BY endpoint`,
roomID, roomID,
@@ -230,7 +283,7 @@ type RoomMembership struct {
// ListRoomsForEndpoint returns every room the given endpoint is a member of, // ListRoomsForEndpoint returns every room the given endpoint is a member of,
// with the room's current metadata and the endpoint's role, ordered by room id. // with the room's current metadata and the endpoint's role, ordered by room id.
// An endpoint that is in no rooms yields an empty slice (not an error). // An endpoint that is in no rooms yields an empty slice (not an error).
func (s *Store) ListRoomsForEndpoint(endpoint string) ([]RoomMembership, error) { func (s *sqliteStore) ListRoomsForEndpoint(endpoint string) ([]RoomMembership, error) {
rows, err := s.db.Query( rows, err := s.db.Query(
`SELECT r.room_id, r.subject, r.key_epoch, r.encrypt, r.persist, r.sign_msgs, r.owner_endpoint, m.role `SELECT r.room_id, r.subject, r.key_epoch, r.encrypt, r.persist, r.sign_msgs, r.owner_endpoint, m.role
FROM members m JOIN rooms r ON r.room_id = m.room_id FROM members m JOIN rooms r ON r.room_id = m.room_id
@@ -257,7 +310,7 @@ func (s *Store) ListRoomsForEndpoint(endpoint string) ([]RoomMembership, error)
// GetSealedKey returns the sealed room key for an endpoint at a given epoch. // GetSealedKey returns the sealed room key for an endpoint at a given epoch.
// If epoch <= 0, the latest epoch for that endpoint is returned. // If epoch <= 0, the latest epoch for that endpoint is returned.
func (s *Store) GetSealedKey(roomID, endpoint string, epoch int) (int, []byte, error) { func (s *sqliteStore) GetSealedKey(roomID, endpoint string, epoch int) (int, []byte, error) {
var ep int var ep int
var sealed []byte var sealed []byte
var err error var err error
@@ -275,6 +328,12 @@ func (s *Store) GetSealedKey(roomID, endpoint string, epoch int) (int, []byte, e
).Scan(&ep, &sealed) ).Scan(&ep, &sealed)
} }
if err != nil { if err != nil {
// Map "no such row" to the store-agnostic sentinel so the control plane
// can tell "not invited / no key yet" (-> 403 with a helpful message) from
// a genuine backend failure, the same way the KV store will.
if errors.Is(err, sql.ErrNoRows) {
return 0, nil, fmt.Errorf("membership: get sealed key %q/%q@%d: %w", roomID, endpoint, epoch, ErrNotFound)
}
return 0, nil, fmt.Errorf("membership: get sealed key %q/%q@%d: %w", roomID, endpoint, epoch, err) return 0, nil, fmt.Errorf("membership: get sealed key %q/%q@%d: %w", roomID, endpoint, epoch, err)
} }
return ep, sealed, nil return ep, sealed, nil
@@ -282,7 +341,7 @@ func (s *Store) GetSealedKey(roomID, endpoint string, epoch int) (int, []byte, e
// PutSealedKeys stores a batch of sealed keys for the given epoch (endpoint -> // PutSealedKeys stores a batch of sealed keys for the given epoch (endpoint ->
// sealed bytes), upserting on conflict so a rekey can overwrite stale entries. // sealed bytes), upserting on conflict so a rekey can overwrite stale entries.
func (s *Store) PutSealedKeys(roomID string, epoch int, keys map[string][]byte) error { func (s *sqliteStore) PutSealedKeys(roomID string, epoch int, keys map[string][]byte) error {
tx, err := s.db.Begin() tx, err := s.db.Begin()
if err != nil { if err != nil {
return fmt.Errorf("membership: begin: %w", err) return fmt.Errorf("membership: begin: %w", err)
@@ -301,7 +360,7 @@ func (s *Store) PutSealedKeys(roomID string, epoch int, keys map[string][]byte)
} }
// BumpEpoch sets the room's current key_epoch to newEpoch. // BumpEpoch sets the room's current key_epoch to newEpoch.
func (s *Store) BumpEpoch(roomID string, newEpoch int) error { func (s *sqliteStore) BumpEpoch(roomID string, newEpoch int) error {
if _, err := s.db.Exec(`UPDATE rooms SET key_epoch = ? WHERE room_id = ?`, newEpoch, roomID); err != nil { if _, err := s.db.Exec(`UPDATE rooms SET key_epoch = ? WHERE room_id = ?`, newEpoch, roomID); err != nil {
return fmt.Errorf("membership: bump epoch %q->%d: %w", roomID, newEpoch, err) return fmt.Errorf("membership: bump epoch %q->%d: %w", roomID, newEpoch, err)
} }
@@ -310,7 +369,7 @@ func (s *Store) BumpEpoch(roomID string, newEpoch int) error {
// RemoveMember deletes a member from a room. Their sealed keys for past epochs // RemoveMember deletes a member from a room. Their sealed keys for past epochs
// are left intact (they encrypt only data that member could already read). // are left intact (they encrypt only data that member could already read).
func (s *Store) RemoveMember(roomID, endpoint string) error { func (s *sqliteStore) RemoveMember(roomID, endpoint string) error {
if _, err := s.db.Exec(`DELETE FROM members WHERE room_id = ? AND endpoint = ?`, roomID, endpoint); err != nil { if _, err := s.db.Exec(`DELETE FROM members WHERE room_id = ? AND endpoint = ?`, roomID, endpoint); err != nil {
return fmt.Errorf("membership: remove member %q/%q: %w", roomID, endpoint, err) return fmt.Errorf("membership: remove member %q/%q: %w", roomID, endpoint, err)
} }
+2 -2
View File
@@ -6,10 +6,10 @@ import (
"testing" "testing"
) )
func openTestStore(t *testing.T) *Store { func openTestStore(t *testing.T) *sqliteStore {
t.Helper() t.Helper()
path := filepath.Join(t.TempDir(), "test.db") path := filepath.Join(t.TempDir(), "test.db")
s, err := Open(path) s, err := openSQLite(path)
if err != nil { if err != nil {
t.Fatalf("Open: %v", err) t.Fatalf("Open: %v", err)
} }
+24 -6
View File
@@ -2,6 +2,7 @@ package membership
import ( import (
"database/sql" "database/sql"
"encoding/hex"
"errors" "errors"
"fmt" "fmt"
"strings" "strings"
@@ -35,6 +36,23 @@ type User struct {
RevokedAt string // empty unless revoked RevokedAt string // empty unless revoked
} }
// ValidateSignPubHex ensures signPub is exactly a 32-byte Ed25519 public key in
// hex (64 hex chars). It is the single source of truth for that check, shared by
// the local admin CLI (which validates before seeding the first admin) and the
// HTTP user-management handlers (which validate an admin-supplied key before it
// reaches the store). Catching a malformed key here turns a silent "authorized
// nobody" into an explicit error at the boundary.
func ValidateSignPubHex(signPub string) error {
b, err := hex.DecodeString(signPub)
if err != nil {
return fmt.Errorf("sign-pub is not valid hex: %w", err)
}
if len(b) != 32 {
return fmt.Errorf("sign-pub must be a 32-byte Ed25519 public key (64 hex chars), got %d bytes", len(b))
}
return nil
}
// normalizeSignPub lowercases the hex key so lookups are case-insensitive: the // normalizeSignPub lowercases the hex key so lookups are case-insensitive: the
// primary key is stored lowercase and every query normalizes its input the same // primary key is stored lowercase and every query normalizes its input the same
// way, so a caller passing uppercase hex still matches. // way, so a caller passing uppercase hex still matches.
@@ -45,7 +63,7 @@ func normalizeSignPub(signPub string) string {
// AddUser inserts a new bus user. role defaults to RoleMember when empty. It // AddUser inserts a new bus user. role defaults to RoleMember when empty. It
// returns ErrUserExists if the sign_pub is already registered (the caller may // returns ErrUserExists if the sign_pub is already registered (the caller may
// choose to revoke+re-add or ignore). handle and signPub must be non-empty. // choose to revoke+re-add or ignore). handle and signPub must be non-empty.
func (s *Store) AddUser(signPub, handle, role string) error { func (s *sqliteStore) AddUser(signPub, handle, role string) error {
signPub = normalizeSignPub(signPub) signPub = normalizeSignPub(signPub)
if signPub == "" || handle == "" { if signPub == "" || handle == "" {
return fmt.Errorf("membership: AddUser: sign_pub and handle required") return fmt.Errorf("membership: AddUser: sign_pub and handle required")
@@ -74,7 +92,7 @@ func (s *Store) AddUser(signPub, handle, role string) error {
// GetUser returns the user with the given signing public key. It returns // GetUser returns the user with the given signing public key. It returns
// sql.ErrNoRows (wrapped) when there is no such user. // sql.ErrNoRows (wrapped) when there is no such user.
func (s *Store) GetUser(signPub string) (User, error) { func (s *sqliteStore) GetUser(signPub string) (User, error) {
signPub = normalizeSignPub(signPub) signPub = normalizeSignPub(signPub)
var u User var u User
var revoked sql.NullString var revoked sql.NullString
@@ -90,7 +108,7 @@ func (s *Store) GetUser(signPub string) (User, error) {
} }
// ListUsers returns every user ordered by handle then sign_pub (stable output). // ListUsers returns every user ordered by handle then sign_pub (stable output).
func (s *Store) ListUsers() ([]User, error) { func (s *sqliteStore) ListUsers() ([]User, error) {
rows, err := s.db.Query( rows, err := s.db.Query(
`SELECT sign_pub, handle, role, status, created_at, revoked_at FROM users ORDER BY handle, sign_pub`, `SELECT sign_pub, handle, role, status, created_at, revoked_at FROM users ORDER BY handle, sign_pub`,
) )
@@ -116,7 +134,7 @@ func (s *Store) ListUsers() ([]User, error) {
// status flip (not a delete) so the identity stays auditable and IsAuthorized // status flip (not a delete) so the identity stays auditable and IsAuthorized
// immediately denies it on both planes. Revoking an unknown or already-revoked // immediately denies it on both planes. Revoking an unknown or already-revoked
// user returns an error / is a no-op respectively. // user returns an error / is a no-op respectively.
func (s *Store) RevokeUser(signPub string) error { func (s *sqliteStore) RevokeUser(signPub string) error {
signPub = normalizeSignPub(signPub) signPub = normalizeSignPub(signPub)
res, err := s.db.Exec( res, err := s.db.Exec(
`UPDATE users SET status = ?, revoked_at = ? WHERE sign_pub = ? AND status = ?`, `UPDATE users SET status = ?, revoked_at = ? WHERE sign_pub = ? AND status = ?`,
@@ -140,7 +158,7 @@ func (s *Store) RevokeUser(signPub string) error {
// plane (HTTP request middleware) and the data plane (NATS nkey authenticator), // plane (HTTP request middleware) and the data plane (NATS nkey authenticator),
// so revoking a user denies access on both without restarting anything. An // so revoking a user denies access on both without restarting anything. An
// unknown key, a revoked key, or any query error all yield false (fail closed). // unknown key, a revoked key, or any query error all yield false (fail closed).
func (s *Store) IsAuthorized(signPub string) bool { func (s *sqliteStore) IsAuthorized(signPub string) bool {
signPub = normalizeSignPub(signPub) signPub = normalizeSignPub(signPub)
if signPub == "" { if signPub == "" {
return false return false
@@ -155,7 +173,7 @@ func (s *Store) IsAuthorized(signPub string) bool {
// HasAdmin reports whether at least one active admin exists. The control plane // HasAdmin reports whether at least one active admin exists. The control plane
// uses it to gate user-management endpoints: until the host operator seeds the // uses it to gate user-management endpoints: until the host operator seeds the
// first admin via the local CLI, those endpoints stay closed (chicken-egg). // first admin via the local CLI, those endpoints stay closed (chicken-egg).
func (s *Store) HasAdmin() bool { func (s *sqliteStore) HasAdmin() bool {
var one int var one int
err := s.db.QueryRow( err := s.db.QueryRow(
`SELECT 1 FROM users WHERE role = ? AND status = ? LIMIT 1`, RoleAdmin, StatusActive, `SELECT 1 FROM users WHERE role = ? AND status = ? LIMIT 1`, RoleAdmin, StatusActive,
+164
View File
@@ -0,0 +1,164 @@
package membership
import (
"encoding/hex"
"encoding/json"
"net/http"
"testing"
"time"
cs "fn-registry/functions/cybersecurity"
)
// signedJSON is signedReq for a JSON body: it marshals v and signs the request
// as id with a distinct nonce. It returns the response status and body, reusing
// the auth_test harness so these tests exercise the real signed wire contract.
func signedJSON(t *testing.T, h *authHarness, method, path string, v any, id cs.Identity, n int) (int, string) {
t.Helper()
var body []byte
if v != nil {
b, err := json.Marshal(v)
if err != nil {
t.Fatalf("marshal body: %v", err)
}
body = b
}
return do(t, signedReq(t, h.ts.URL, method, path, body, id, time.Now().Unix(), nonceN(n)))
}
// TestUsersHTTP_NonAdminForbidden is the security spine: a REGISTERED but
// non-admin signer (bob, role member) is denied on every user-management
// endpoint. His signature clears auth (he is in the allowlist), so each request
// reaches the handler, where requireAdmin returns 403 — default-deny by role.
func TestUsersHTTP_NonAdminForbidden(t *testing.T) {
h := newAuthHarness(t, AuthEnforce)
bob, _ := cs.GenerateIdentity()
register(t, h, bob, "bob") // role member (see register in authz_test.go)
bobPub := hex.EncodeToString(bob.SignPub)
victim, _ := cs.GenerateIdentity()
victimPub := hex.EncodeToString(victim.SignPub)
checks := []struct {
name string
method string
path string
body any
}{
{"list users", "GET", "/users", nil},
{"add user", "POST", "/users", addUserReq{SignPub: victimPub, Handle: "mallory", Role: RoleMember}},
{"revoke user", "POST", "/users/" + bobPub + "/revoke", nil},
}
for i, c := range checks {
code, body := signedJSON(t, h, c.method, c.path, c.body, bob, i+1)
if code != http.StatusForbidden {
t.Fatalf("non-admin %s should be 403, got %d (%s)", c.name, code, body)
}
}
}
// TestUsersHTTP_AdminRoundtrip exercises the golden path end to end: alice (the
// seeded admin) adds carol, sees her in the list as active, revokes her, then
// sees her status flip to revoked (no hard delete — she stays in the list).
func TestUsersHTTP_AdminRoundtrip(t *testing.T) {
h := newAuthHarness(t, AuthEnforce)
carol, _ := cs.GenerateIdentity()
carolPub := hex.EncodeToString(carol.SignPub)
// Add carol as a member.
if code, body := signedJSON(t, h, "POST", "/users",
addUserReq{SignPub: carolPub, Handle: "carol", Role: RoleMember}, h.alice, 1); code != http.StatusCreated {
t.Fatalf("admin add carol should be 201, got %d (%s)", code, body)
}
// List: carol present and active; alice (the seed admin) also present.
users := listUsers(t, h, 2)
carolRow, ok := findUser(users, carolPub)
if !ok {
t.Fatalf("carol missing from list after add: %+v", users)
}
if carolRow.Status != StatusActive || carolRow.Role != RoleMember || carolRow.Handle != "carol" {
t.Fatalf("carol row wrong after add: %+v", carolRow)
}
if _, ok := findUser(users, h.alicePub); !ok {
t.Fatalf("seeded admin alice missing from list: %+v", users)
}
// Revoke carol.
if code, body := signedJSON(t, h, "POST", "/users/"+carolPub+"/revoke", nil, h.alice, 3); code != http.StatusOK {
t.Fatalf("admin revoke carol should be 200, got %d (%s)", code, body)
}
// List again: carol still present, now revoked (status flip, not delete).
users = listUsers(t, h, 4)
carolRow, ok = findUser(users, carolPub)
if !ok {
t.Fatalf("carol vanished from list after revoke (should be a status flip): %+v", users)
}
if carolRow.Status != StatusRevoked {
t.Fatalf("carol should be revoked, got status %q", carolRow.Status)
}
}
// TestUsersHTTP_Validation covers the input-validation contract: a malformed hex
// key is 400, an unknown role is 400, and re-adding an already-registered key is
// 409 (the existing row is left untouched — no silent upsert).
func TestUsersHTTP_Validation(t *testing.T) {
h := newAuthHarness(t, AuthEnforce)
good, _ := cs.GenerateIdentity()
goodPub := hex.EncodeToString(good.SignPub)
// Invalid hex (too short) -> 400.
if code, body := signedJSON(t, h, "POST", "/users",
addUserReq{SignPub: "abcd", Handle: "shorty", Role: RoleMember}, h.alice, 1); code != http.StatusBadRequest {
t.Fatalf("malformed sign_pub should be 400, got %d (%s)", code, body)
}
// Invalid role -> 400.
if code, body := signedJSON(t, h, "POST", "/users",
addUserReq{SignPub: goodPub, Handle: "weirdrole", Role: "superuser"}, h.alice, 2); code != http.StatusBadRequest {
t.Fatalf("invalid role should be 400, got %d (%s)", code, body)
}
// Re-adding the seeded admin's own key -> 409 (idempotency, no overwrite).
if code, body := signedJSON(t, h, "POST", "/users",
addUserReq{SignPub: h.alicePub, Handle: "alice-again", Role: RoleMember}, h.alice, 3); code != http.StatusConflict {
t.Fatalf("re-adding an existing key should be 409, got %d (%s)", code, body)
}
// And the existing row is untouched: alice is still an active admin.
u, err := h.store.GetUser(h.alicePub)
if err != nil {
t.Fatalf("get alice after conflicting re-add: %v", err)
}
if u.Role != RoleAdmin || u.Status != StatusActive || u.Handle != "alice" {
t.Fatalf("conflicting re-add mutated the existing row: %+v", u)
}
}
// listUsers signs a GET /users as alice and decodes the response.
func listUsers(t *testing.T, h *authHarness, n int) []userJSON {
t.Helper()
code, body := signedJSON(t, h, "GET", "/users", nil, h.alice, n)
if code != http.StatusOK {
t.Fatalf("admin list users should be 200, got %d (%s)", code, body)
}
var users []userJSON
if err := json.Unmarshal([]byte(body), &users); err != nil {
t.Fatalf("decode users: %v (%s)", err, body)
}
return users
}
// findUser returns the row with the given signing key (case-insensitive).
func findUser(users []userJSON, signPub string) (userJSON, bool) {
want := normalizeSignPub(signPub)
for _, u := range users {
if normalizeSignPub(u.SignPub) == want {
return u, true
}
}
return userJSON{}, false
}
-119
View File
@@ -1,119 +0,0 @@
# unibus playground
An all-in-one, web-based sandbox for the **unibus** message bus. One command
brings up the entire stack embedded — no NATS to install, no services to wire —
and a browser UI lets you exercise the bus visually: create peers, create and
join rooms (cleartext or end-to-end encrypted), invite, publish, watch messages
arrive live, and kick members (forward secrecy).
This is a **playground** (see `.claude/rules/playgrounds.md`): it lives inside
the `unibus` app, reuses the parent Go module (no separate `go.mod`), is not
indexed, and keeps all runtime state under `playground/local_files/` (ephemeral,
safe to delete).
## Run
From the `unibus` app directory:
```bash
cd /home/enmanuel/fn_registry/projects/message_bus/apps/unibus
go run ./playground
```
Then open **http://localhost:7700** in your browser.
Stop with `Ctrl-C` — the server tears down the web UI, every bus client, the
control plane, and the embedded NATS cleanly (no orphaned processes).
## Architecture
The browser never speaks NATS. The Go server is the actual bus peer:
```
browser ──fetch/SSE──▶ playground server (:7700)
│ holds one unibus client per named peer
├──HTTP──▶ membership control plane (127.0.0.1:8480)
└──NATS──▶ embedded NATS + JetStream (:4260)
```
- **:7700** — web UI (the only browser-facing port).
- **127.0.0.1:8480** — membership control plane (rooms, members, sealed keys,
rekey, blobs). Internal only.
- **:4260** — embedded NATS + JetStream (the data plane). Internal only.
Each named peer gets its own long-term identity, persisted to
`playground/local_files/<name>.id`, so a peer keeps the same endpoint across
restarts. When a peer creates or joins a room, the server subscribes on its
behalf and streams every received frame to that peer's open browser tabs over
Server-Sent Events.
The playground only orchestrates the public unibus client API
(`CreateRoom`, `Join`, `Subscribe`, `Publish`, `Invite`, `Kick`); it never
reimplements bus or crypto logic.
## Try it: 2 peers + encryption + kick
1. Open **two browser tabs** on http://localhost:7700.
2. Tab A: type `alice`, click **Connect**.
3. Tab B: type `bob`, click **Connect**.
4. Tab A (alice): type a subject like `room.general`, tick **🔒 encrypted
(E2E)**, click **Create room**. Copy the resulting `room_id`.
5. Tab A (alice): in the Action panel, pick `bob` as the target peer (use the
↻ button to refresh the peer list if needed) and click **Invite to this
room**.
6. Tab B (bob): paste the `room_id` into the join field and click **Join**.
7. Type messages in **both** tabs and hit Send — each message appears live in
both tabs, tagged with subject, sender, time, and 🔒 (encrypted) or `clear`.
8. Tab A (alice): click **Kick from this room** with `bob` selected. The room
key rotates to a new epoch. New messages alice sends are no longer visible to
bob — **forward secrecy**: bob no longer holds the current key.
Cleartext rooms (leave the checkbox unticked) behave like plain NATS fan-out:
fast, ephemeral, unsigned. Encrypted rooms are the Matrix-like mode: E2E
encrypted, persisted, and per-message signed.
## Benchmark: throughput simulator
The bottom panel of the UI is a performance simulator. Press **▶ Ejecutar
benchmark** and one publisher floods a fresh room with thousands of messages
that N subscribers receive (fan-out); a live canvas chart animates the sent vs
received totals while it runs.
The two policy axes are exposed as **independent flags**, so the benchmark
measures the cost of each layer in isolation:
| JetStream | Encryption | Room policy | What it costs |
|---|---|---|---|
| off | off | `{Encrypt:false, Persist:false}` | plain core NATS fan-out |
| **on** | off | `{Encrypt:false, Persist:true}` | durable JetStream (publish ack per message) |
| off | **on** | `{Encrypt:true, Persist:false}` | AEAD + Ed25519 signature per message, core transport |
| **on** | **on** | `{Encrypt:true, Persist:true}` | full E2E + durable history |
A **payload size** slider (16 B 8 KiB) sets the message size. Encrypted or
persistent runs are capped to 30 000 messages (each message pays per-message
crypto and/or a JetStream ack, so they run much slower than plain NATS).
The benchmark uses its own ephemeral peers (fresh identities, never persisted),
so it never touches the named peers of the manual sandbox.
It is driven by an SSE endpoint that streams progress samples:
```bash
curl -N "http://localhost:7700/api/bench?n_msgs=20000&n_subs=3&payload=128&encrypt=0&persist=0"
# emits: data: {"type":"start",...} data: {"type":"sample",...} data: {"type":"done",...}
```
Query params: `n_msgs`, `n_subs` (116), `payload` (bytes), `encrypt` (0/1),
`persist` (0/1).
## State / cleanup
All writable state lives under `playground/local_files/`:
- `<name>.id` — per-peer identity (private keys; treat like an SSH key).
- `play.db` — membership store (rooms, members, sealed keys).
- `blobs/` — media blob store.
- `js/` — embedded JetStream store.
Delete the whole `playground/local_files/` directory to reset to a clean slate.
It is gitignored and never distributed.
-594
View File
@@ -1,594 +0,0 @@
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>unibus playground</title>
<style>
:root {
--bg: #0d1117;
--panel: #161b22;
--panel2: #1c2230;
--border: #2b333f;
--fg: #e6edf3;
--muted: #8b98a5;
--accent: #2f81f7;
--green: #3fb950;
--gold: #d29922;
--red: #f85149;
--mono: ui-monospace, "SF Mono", "Cascadia Code", Menlo, Consolas, monospace;
}
* { box-sizing: border-box; }
body {
margin: 0;
background: var(--bg);
color: var(--fg);
font-family: var(--mono);
font-size: 14px;
line-height: 1.5;
}
header {
padding: 14px 20px;
border-bottom: 1px solid var(--border);
display: flex;
align-items: baseline;
gap: 12px;
}
header h1 { margin: 0; font-size: 18px; letter-spacing: 0.5px; }
header .sub { color: var(--muted); font-size: 12px; }
.wrap {
display: grid;
grid-template-columns: 360px 1fr;
gap: 16px;
padding: 16px 20px;
max-width: 1200px;
}
.col { display: flex; flex-direction: column; gap: 14px; }
.card {
background: var(--panel);
border: 1px solid var(--border);
border-radius: 8px;
padding: 14px;
}
.card h2 {
margin: 0 0 10px;
font-size: 13px;
text-transform: uppercase;
letter-spacing: 1px;
color: var(--muted);
}
label { display: block; font-size: 12px; color: var(--muted); margin: 8px 0 3px; }
input[type=text], select {
width: 100%;
background: var(--panel2);
border: 1px solid var(--border);
color: var(--fg);
padding: 7px 9px;
border-radius: 6px;
font-family: var(--mono);
font-size: 13px;
}
input:focus, select:focus { outline: none; border-color: var(--accent); }
.row { display: flex; gap: 8px; align-items: center; }
.row > * { flex: 1; }
.checkrow { display: flex; align-items: center; gap: 6px; margin: 10px 0; }
.checkrow input { flex: 0 0 auto; width: auto; }
.checkrow label { margin: 0; flex: 0 0 auto; }
button {
background: var(--accent);
border: none;
color: #fff;
padding: 7px 12px;
border-radius: 6px;
cursor: pointer;
font-family: var(--mono);
font-size: 13px;
margin-top: 8px;
}
button:hover { filter: brightness(1.12); }
button.ghost { background: var(--panel2); border: 1px solid var(--border); color: var(--fg); }
button.danger { background: #3a1d1d; border: 1px solid var(--red); color: var(--red); }
button:disabled { opacity: 0.4; cursor: not-allowed; }
.pill {
display: inline-block;
background: var(--panel2);
border: 1px solid var(--border);
border-radius: 12px;
padding: 2px 9px;
font-size: 11px;
color: var(--muted);
}
.pill.on { color: var(--green); border-color: var(--green); }
.ident { word-break: break-all; font-size: 11px; color: var(--gold); margin-top: 6px; }
.copy {
cursor: pointer; color: var(--accent); font-size: 11px;
margin-left: 6px; text-decoration: underline;
}
#log {
background: #08090c;
border: 1px solid var(--border);
border-radius: 8px;
padding: 10px 12px;
height: 520px;
overflow-y: auto;
font-size: 12.5px;
white-space: pre-wrap;
}
.msg { padding: 2px 0; border-bottom: 1px solid #11151b; }
.msg .subj { color: var(--accent); }
.msg .from { color: var(--gold); }
.msg .meta { color: var(--muted); font-size: 11px; }
.msg .enc { color: var(--green); }
.msg .clear { color: var(--muted); }
.sys { color: var(--muted); font-style: italic; }
.err { color: var(--red); }
.help {
background: var(--panel2);
border-left: 3px solid var(--accent);
padding: 10px 12px;
border-radius: 4px;
font-size: 12px;
color: var(--muted);
line-height: 1.6;
}
.help b { color: var(--fg); }
.help code { color: var(--gold); }
.status { font-size: 11px; color: var(--muted); margin-top: 6px; min-height: 14px; }
.status.ok { color: var(--green); }
.status.bad { color: var(--red); }
</style>
</head>
<body>
<header>
<h1>unibus playground</h1>
<span class="sub">embedded NATS + JetStream &middot; E2E rooms &middot; forward secrecy &middot; SSE</span>
</header>
<div class="wrap">
<!-- LEFT COLUMN: controls -->
<div class="col">
<div class="card">
<h2>1 &middot; Identity</h2>
<label>Peer name</label>
<div class="row">
<input id="peerName" type="text" placeholder="alice" autocomplete="off" />
<button id="connectBtn" style="flex:0 0 auto">Connect</button>
</div>
<div id="peerIdent" class="ident"></div>
<div id="connStatus" class="status"></div>
</div>
<div class="card">
<h2>2 &middot; Rooms</h2>
<label>Subject (e.g. room.general)</label>
<input id="roomSubject" type="text" placeholder="room.general" autocomplete="off" />
<div class="checkrow">
<input id="roomEncrypt" type="checkbox" />
<label for="roomEncrypt">&#128274; encrypted (E2E)</label>
</div>
<div class="checkrow">
<input id="roomPersist" type="checkbox" />
<label for="roomPersist">&#128450; persistente (historial)</label>
</div>
<div class="help" style="margin:-4px 0 8px; font-size:12px; color:var(--muted)">
persistente = quien se une despues ve el historial; sin persistir = solo mensajes nuevos (NATS simple).
</div>
<button id="createRoomBtn" disabled>Create room</button>
<div style="border-top:1px solid var(--border); margin:12px 0"></div>
<label>Join by room_id</label>
<input id="joinRoomId" type="text" placeholder="01J..." autocomplete="off" />
<button id="joinBtn" class="ghost" disabled>Join</button>
<div id="roomStatus" class="status"></div>
</div>
<div class="card">
<h2>3 &middot; Action</h2>
<label>Active room</label>
<select id="activeRoom"></select>
<label>Message</label>
<div class="row">
<input id="msgText" type="text" placeholder="hello bus" autocomplete="off" />
<button id="sendBtn" style="flex:0 0 auto" disabled>Send</button>
</div>
<div style="border-top:1px solid var(--border); margin:12px 0"></div>
<label>Target peer</label>
<div class="row">
<select id="targetPeer"></select>
<button id="refreshPeersBtn" class="ghost" style="flex:0 0 auto" title="reload peer list">&#8635;</button>
</div>
<button id="inviteBtn" disabled>Invite to this room</button>
<button id="kickBtn" class="danger" disabled>Kick from this room</button>
<div id="actionStatus" class="status"></div>
</div>
</div>
<!-- RIGHT COLUMN: live messages + help -->
<div class="col">
<div class="card" style="padding-bottom:8px">
<h2>Live messages <span id="streamPill" class="pill">disconnected</span></h2>
<div id="log"></div>
</div>
<div class="help">
<b>&#9432; How to try it</b><br />
Open <b>2 tabs</b>. Connect as <code>alice</code> in one and <code>bob</code> in the other.
In alice: create a <code>&#128274; encrypted</code> room, copy the <code>room_id</code>,
then pick <code>bob</code> as target and <b>Invite to this room</b>.
In bob: paste that <code>room_id</code> and <b>Join</b>.
Type in both &rarr; messages appear live on each side.
In alice: <b>Kick</b> bob &rarr; bob stops seeing new messages (forward secrecy: the room
key rotates and bob no longer holds it).
</div>
</div>
</div>
<!-- BENCHMARK: full-width performance simulator -->
<div style="padding: 0 20px 32px; max-width: 1200px;">
<div class="card">
<h2>Benchmark de rendimiento &middot; 1 publisher &rarr; N subscribers</h2>
<div style="display:flex; gap:26px; flex-wrap:wrap; align-items:flex-end; margin-bottom:6px;">
<div style="min-width:230px;">
<label>Mensajes a publicar &middot; <span id="bMsgsVal" style="color:var(--fg)">20 000</span></label>
<input id="bMsgs" type="range" min="1000" max="200000" step="1000" value="20000" style="width:100%; accent-color:var(--accent);" />
</div>
<div style="min-width:160px;">
<label>Subscribers &middot; <span id="bSubsVal" style="color:var(--fg)">3</span></label>
<input id="bSubs" type="range" min="1" max="16" step="1" value="3" style="width:100%; accent-color:var(--accent);" />
</div>
<div style="min-width:200px;">
<label>Tamaño payload &middot; <span id="bPayVal" style="color:var(--fg)">128 B</span></label>
<input id="bPay" type="range" min="16" max="8192" step="16" value="128" style="width:100%; accent-color:var(--accent);" />
</div>
<div class="checkrow" style="margin:0;">
<input id="bPersist" type="checkbox" />
<label for="bPersist">&#128450; JetStream (persistente)</label>
</div>
<div class="checkrow" style="margin:0;">
<input id="bEncrypt" type="checkbox" />
<label for="bEncrypt">&#128274; Encriptación E2E</label>
</div>
<button id="bRun" style="margin:0;">&#9654; Ejecutar benchmark</button>
</div>
<div class="help" style="margin:6px 0 12px;">
<b>JetStream</b> y <b>Encriptación</b> son ejes independientes: NATS core (ambos off) &middot; JetStream durable &middot; E2E (AEAD + firma Ed25519 por mensaje) &middot; E2E + JetStream. Los modos con cripto o persistencia se limitan a 30&nbsp;000 mensajes (cada mensaje paga cifrado/firma/ack).
</div>
<div style="display:flex; gap:30px; flex-wrap:wrap; margin:4px 2px 8px;">
<div><div style="font-size:11px; color:var(--muted); text-transform:uppercase; letter-spacing:.05em;">Enviados</div><div id="bSent" style="font-size:22px; color:var(--accent);">0</div></div>
<div><div style="font-size:11px; color:var(--muted); text-transform:uppercase; letter-spacing:.05em;">Recibidos (&Sigma; subs)</div><div id="bRecv" style="font-size:22px; color:var(--green);">0</div></div>
<div><div style="font-size:11px; color:var(--muted); text-transform:uppercase; letter-spacing:.05em;">Throughput recv</div><div id="bTps" style="font-size:22px; color:var(--gold);">0</div></div>
<div><div style="font-size:11px; color:var(--muted); text-transform:uppercase; letter-spacing:.05em;">Tiempo</div><div id="bTime" style="font-size:22px;">0.00 s</div></div>
</div>
<canvas id="bChart" style="width:100%; height:300px; display:block; background:#08090c; border:1px solid var(--border); border-radius:8px;"></canvas>
<div style="display:flex; gap:18px; font-size:12px; color:var(--muted); margin-top:6px;">
<span><span style="display:inline-block;width:10px;height:10px;border-radius:50%;background:var(--accent);margin-right:6px;"></span>enviados (publisher)</span>
<span><span style="display:inline-block;width:10px;height:10px;border-radius:50%;background:var(--green);margin-right:6px;"></span>recibidos (suma de subscribers)</span>
</div>
<div id="bStatus" class="status" style="margin-top:8px;"></div>
</div>
</div>
<script>
"use strict";
const state = {
peer: null, // connected peer name
rooms: {}, // room_id -> {subject, encrypt}
es: null, // EventSource
};
const $ = (id) => document.getElementById(id);
async function api(path, body) {
const opts = { method: "POST", headers: { "Content-Type": "application/json" } };
if (body !== undefined) opts.body = JSON.stringify(body);
const res = await fetch(path, opts);
const data = await res.json().catch(() => ({}));
if (!res.ok) throw new Error(data.error || ("HTTP " + res.status));
return data;
}
async function apiGet(path) {
const res = await fetch(path);
const data = await res.json().catch(() => ({}));
if (!res.ok) throw new Error(data.error || ("HTTP " + res.status));
return data;
}
function setStatus(id, msg, kind) {
const el = $(id);
el.textContent = msg || "";
el.className = "status" + (kind ? " " + kind : "");
}
function short(s, n = 10) {
if (!s) return "";
return s.length <= n * 2 ? s : s.slice(0, n) + "…" + s.slice(-4);
}
function hhmmss(ms) {
const d = new Date(ms);
const p = (x) => String(x).padStart(2, "0");
return p(d.getHours()) + ":" + p(d.getMinutes()) + ":" + p(d.getSeconds());
}
function logSys(text, cls) {
const log = $("log");
const div = document.createElement("div");
div.className = "msg " + (cls || "sys");
div.textContent = text;
log.appendChild(div);
log.scrollTop = log.scrollHeight;
}
function logMsg(ev) {
const log = $("log");
const div = document.createElement("div");
div.className = "msg";
const enc = ev.encrypted
? '<span class="enc">&#128274;</span>'
: '<span class="clear">clear</span>';
div.innerHTML =
'<span class="subj">[' + escapeHtml(ev.subject) + ']</span> ' +
'<span class="from">' + escapeHtml(short(ev.sender)) + '</span> &#8614; ' +
escapeHtml(ev.text) +
' <span class="meta">&middot; ' + hhmmss(ev.ts) + ' &middot; ' + enc + '</span>';
log.appendChild(div);
log.scrollTop = log.scrollHeight;
}
function escapeHtml(s) {
return String(s).replace(/[&<>"']/g, (c) => ({
"&": "&amp;", "<": "&lt;", ">": "&gt;", '"': "&quot;", "'": "&#39;",
}[c]));
}
function refreshRoomSelect() {
const sel = $("activeRoom");
const cur = sel.value;
sel.innerHTML = "";
for (const [id, info] of Object.entries(state.rooms)) {
const opt = document.createElement("option");
opt.value = id;
opt.textContent = info.subject + " (" + short(id, 6) + ")" + (info.encrypt ? " 🔒" : "");
sel.appendChild(opt);
}
if (state.rooms[cur]) sel.value = cur;
const has = Object.keys(state.rooms).length > 0;
$("sendBtn").disabled = !has;
$("inviteBtn").disabled = !has;
$("kickBtn").disabled = !has;
}
async function refreshPeers() {
try {
const peers = await apiGet("/api/peers");
const sel = $("targetPeer");
const cur = sel.value;
sel.innerHTML = "";
for (const p of peers) {
if (p.name === state.peer) continue; // don't target yourself
const opt = document.createElement("option");
opt.value = p.name;
opt.textContent = p.name + " (" + short(p.endpoint_id, 6) + ")";
sel.appendChild(opt);
}
if ([...sel.options].some((o) => o.value === cur)) sel.value = cur;
} catch (e) {
setStatus("actionStatus", "peers: " + e.message, "bad");
}
}
function openStream(name) {
if (state.es) state.es.close();
const es = new EventSource("/api/stream?peer=" + encodeURIComponent(name));
es.onopen = () => {
$("streamPill").textContent = "live: " + name;
$("streamPill").className = "pill on";
};
es.onmessage = (e) => {
try { logMsg(JSON.parse(e.data)); } catch (_) {}
};
es.onerror = () => {
$("streamPill").textContent = "reconnecting…";
$("streamPill").className = "pill";
};
state.es = es;
}
// ---- handlers ----
$("connectBtn").onclick = async () => {
const name = $("peerName").value.trim();
if (!name) { setStatus("connStatus", "enter a name", "bad"); return; }
try {
const res = await api("/api/peer", { name });
state.peer = res.name;
state.rooms = {};
refreshRoomSelect();
$("peerIdent").innerHTML =
'endpoint: ' + escapeHtml(res.endpoint_id) +
' <span class="copy" id="copyId">copy</span>';
$("copyId").onclick = () => navigator.clipboard.writeText(res.endpoint_id);
setStatus("connStatus", "connected as " + res.name, "ok");
$("createRoomBtn").disabled = false;
$("joinBtn").disabled = false;
$("log").innerHTML = "";
logSys("connected as " + res.name + " — listening for messages");
openStream(res.name);
refreshPeers();
} catch (e) {
setStatus("connStatus", e.message, "bad");
}
};
$("createRoomBtn").onclick = async () => {
const subject = $("roomSubject").value.trim();
const encrypt = $("roomEncrypt").checked;
const persist = $("roomPersist").checked;
if (!subject) { setStatus("roomStatus", "subject required", "bad"); return; }
try {
const res = await api("/api/room", { peer: state.peer, subject, encrypt, persist });
state.rooms[res.room_id] = { subject: res.subject, encrypt: res.encrypt };
refreshRoomSelect();
$("activeRoom").value = res.room_id;
setStatus("roomStatus", "created " + res.room_id + " (click to copy)", "ok");
$("roomStatus").style.cursor = "pointer";
$("roomStatus").onclick = () => navigator.clipboard.writeText(res.room_id);
logSys("created room " + res.subject + " [" + short(res.room_id) + "]" + (encrypt ? " 🔒" : "") + (res.persist ? " 🗄" : ""));
} catch (e) {
setStatus("roomStatus", e.message, "bad");
}
};
$("joinBtn").onclick = async () => {
const roomId = $("joinRoomId").value.trim();
if (!roomId) { setStatus("roomStatus", "room_id required", "bad"); return; }
try {
const res = await api("/api/join", { peer: state.peer, room_id: roomId });
state.rooms[roomId] = { subject: res.subject, encrypt: res.encrypt };
refreshRoomSelect();
$("activeRoom").value = roomId;
setStatus("roomStatus", "joined " + res.subject + (res.encrypt ? " 🔒" : ""), "ok");
logSys("joined room " + res.subject + " [" + short(roomId) + "]");
} catch (e) {
setStatus("roomStatus", e.message, "bad");
}
};
$("sendBtn").onclick = async () => {
const roomId = $("activeRoom").value;
const text = $("msgText").value;
if (!roomId) { setStatus("actionStatus", "select a room", "bad"); return; }
try {
await api("/api/publish", { peer: state.peer, room_id: roomId, text });
$("msgText").value = "";
setStatus("actionStatus", "sent", "ok");
} catch (e) {
setStatus("actionStatus", e.message, "bad");
}
};
$("msgText").addEventListener("keydown", (e) => { if (e.key === "Enter") $("sendBtn").click(); });
$("inviteBtn").onclick = async () => {
const roomId = $("activeRoom").value;
const target = $("targetPeer").value;
if (!roomId) { setStatus("actionStatus", "select a room", "bad"); return; }
if (!target) { setStatus("actionStatus", "no target peer (connect another peer first)", "bad"); return; }
try {
await api("/api/invite", { peer: state.peer, room_id: roomId, target });
setStatus("actionStatus", "invited " + target, "ok");
logSys("invited " + target + " to " + short(roomId));
} catch (e) {
setStatus("actionStatus", e.message, "bad");
}
};
$("kickBtn").onclick = async () => {
const roomId = $("activeRoom").value;
const target = $("targetPeer").value;
if (!roomId) { setStatus("actionStatus", "select a room", "bad"); return; }
if (!target) { setStatus("actionStatus", "no target peer", "bad"); return; }
try {
await api("/api/kick", { peer: state.peer, room_id: roomId, target });
setStatus("actionStatus", "kicked " + target + " (key rotated)", "ok");
logSys("kicked " + target + " from " + short(roomId) + " — key rotated (forward secrecy)");
} catch (e) {
setStatus("actionStatus", e.message, "bad");
}
};
$("refreshPeersBtn").onclick = refreshPeers;
$("peerName").addEventListener("keydown", (e) => { if (e.key === "Enter") $("connectBtn").click(); });
// ---- benchmark ----
const fmtN = (n) => Number(n).toLocaleString("es-ES");
const bMsgs = $("bMsgs"), bSubs = $("bSubs"), bPay = $("bPay");
bMsgs.oninput = () => $("bMsgsVal").textContent = fmtN(+bMsgs.value);
bSubs.oninput = () => $("bSubsVal").textContent = bSubs.value;
bPay.oninput = () => $("bPayVal").textContent = fmtN(+bPay.value) + " B";
let bSamples = [], bRunning = false, bES = null;
const bCanvas = $("bChart"), bCtx = bCanvas.getContext("2d");
function cssVar(n) { return getComputedStyle(document.documentElement).getPropertyValue(n).trim(); }
function bResize() {
const dpr = window.devicePixelRatio || 1, r = bCanvas.getBoundingClientRect();
bCanvas.width = r.width * dpr; bCanvas.height = r.height * dpr;
bCtx.setTransform(dpr, 0, 0, dpr, 0, 0); bDraw();
}
window.addEventListener("resize", bResize);
function bDraw() {
const r = bCanvas.getBoundingClientRect(), W = r.width, H = r.height;
const padL = 70, padR = 14, padT = 12, padB = 26;
bCtx.clearRect(0, 0, W, H);
const tMax = bSamples.length ? Math.max(bSamples[bSamples.length - 1].t, 0.001) : 1;
const yMax = bSamples.length ? Math.max(...bSamples.map(s => Math.max(s.sent, s.recv)), 1) : 1;
bCtx.strokeStyle = "#2b333f"; bCtx.fillStyle = "#8b98a5"; bCtx.font = "11px ui-monospace";
for (let i = 0; i <= 5; i++) {
const yy = (H - padB) - (i / 5) * (H - padT - padB);
bCtx.beginPath(); bCtx.moveTo(padL, yy); bCtx.lineTo(W - padR, yy); bCtx.stroke();
bCtx.textAlign = "right"; bCtx.fillText(fmtN(Math.round((i / 5) * yMax)), padL - 8, yy + 3);
}
bCtx.textAlign = "center";
bCtx.fillText("0 s", padL, H - padB + 15);
bCtx.fillText(tMax.toFixed(2) + " s", W - padR, H - padB + 15);
if (bSamples.length < 2) return;
const x = (t) => padL + (t / tMax) * (W - padL - padR);
const y = (v) => (H - padB) - (v / yMax) * (H - padT - padB);
const line = (key, color) => {
bCtx.beginPath(); bCtx.lineWidth = 2.2; bCtx.strokeStyle = color;
bSamples.forEach((s, i) => { const px = x(s.t), py = y(s[key]); i ? bCtx.lineTo(px, py) : bCtx.moveTo(px, py); });
bCtx.stroke();
};
line("sent", cssVar("--accent"));
line("recv", cssVar("--green"));
}
function bSetRunning(v) { bRunning = v; $("bRun").disabled = v; }
$("bRun").onclick = () => {
if (bRunning) return;
bSamples = []; bSetRunning(true);
$("bSent").textContent = "0"; $("bRecv").textContent = "0"; $("bTps").textContent = "0"; $("bTime").textContent = "0.00 s";
setStatus("bStatus", "conectando…");
const qs = new URLSearchParams({
n_msgs: bMsgs.value, n_subs: bSubs.value, payload: bPay.value,
encrypt: $("bEncrypt").checked ? "1" : "0", persist: $("bPersist").checked ? "1" : "0",
});
const es = new EventSource("/api/bench?" + qs.toString());
bES = es;
const finish = () => { try { es.close(); } catch (_) {} bSetRunning(false); };
es.addEventListener("end", finish);
es.onmessage = (e) => {
let m; try { m = JSON.parse(e.data); } catch (_) { return; }
if (m.type === "start") {
setStatus("bStatus",
"corriendo… " + fmtN(m.n_msgs) + " msgs → " + m.n_subs + " subs · payload " + fmtN(m.payload) + "B"
+ (m.encrypt ? " · \u{1F512} E2E" : "") + (m.persist ? " · \u{1F5C4} JetStream" : "")
+ (m.capped ? " · (limitado a 30k)" : ""), "");
} else if (m.type === "sample") {
bSamples.push({ t: m.t, sent: m.sent, recv: m.recv });
$("bSent").textContent = fmtN(m.sent); $("bRecv").textContent = fmtN(m.recv); $("bTime").textContent = m.t.toFixed(2) + " s";
if (bSamples.length >= 2) {
const a = bSamples[bSamples.length - 2], b = bSamples[bSamples.length - 1], dt = b.t - a.t;
if (dt > 0) $("bTps").textContent = fmtN(Math.round((b.recv - a.recv) / dt));
}
bDraw();
} else if (m.type === "done") {
bSamples.push({ t: m.t, sent: m.sent, recv: m.recv });
$("bSent").textContent = fmtN(m.sent); $("bRecv").textContent = fmtN(m.recv);
$("bTps").textContent = fmtN(m.recv_tps); $("bTime").textContent = m.t.toFixed(2) + " s";
setStatus("bStatus",
"✓ " + m.t.toFixed(2) + "s · pub " + fmtN(m.pub_tps) + "/s · recv " + fmtN(m.recv_tps) + "/s · fan-out ×"
+ m.n_subs + " · por sub [" + (m.per_sub || []).map(fmtN).join(", ") + "]", "ok");
bDraw(); finish();
} else if (m.type === "error") {
setStatus("bStatus", "error: " + m.msg, "bad"); finish();
}
};
es.onerror = () => { if (bRunning) { setStatus("bStatus", "conexión SSE perdida", "bad"); finish(); } };
};
bResize();
</script>
</body>
</html>
-933
View File
@@ -1,933 +0,0 @@
// Command playground is an all-in-one, web-based sandbox for the unibus message
// bus. A single `go run ./playground` launches the entire stack embedded:
//
// - an embedded NATS server with JetStream (the data plane),
// - the membership control plane (rooms, members, sealed keys, rekey) over an
// internal HTTP server,
// - the media blob store, and
// - a browser-facing web UI on :7700.
//
// The browser never speaks NATS. The Go server is the actual bus peer: it holds
// one unibus client per named peer, subscribes to rooms on the peer's behalf,
// and streams received messages to the browser over Server-Sent Events. The
// browser drives everything with plain fetch() + EventSource() — no build step,
// no JS framework, no external libraries.
//
// This is a playground (see .claude/rules/playgrounds.md): it lives inside the
// unibus app, reuses the parent module (no new go.mod), is not indexed, and
// stores ephemeral state under playground/local_files/.
package main
import (
"bytes"
"context"
"encoding/json"
"errors"
"fmt"
"io"
"log"
"net/http"
"os"
"os/signal"
"path/filepath"
"strconv"
"sync"
"sync/atomic"
"syscall"
"time"
_ "embed"
cs "fn-registry/functions/cybersecurity"
"github.com/enmanuel/unibus/pkg/blobstore"
"github.com/enmanuel/unibus/pkg/client"
"github.com/enmanuel/unibus/pkg/embeddednats"
"github.com/enmanuel/unibus/pkg/frame"
"github.com/enmanuel/unibus/pkg/membership"
"github.com/enmanuel/unibus/pkg/room"
)
// Fixed ports (verified free before assignment — do not change without reason).
const (
webAddr = "127.0.0.1:7700" // browser-facing web UI
ctrlAddr = "127.0.0.1:8480" // internal membership control plane
ctrlURL = "http://" + ctrlAddr
natsPort = 4260 // internal embedded NATS
natsURL = "nats://127.0.0.1:4260"
localFiles = "playground/local_files"
)
//go:embed index.html
var indexHTML []byte
// ---------------------------------------------------------------------------
// Event: a message received by a peer on one of its subscribed rooms. Fanned
// out to every SSE listener attached to that peer.
// ---------------------------------------------------------------------------
type Event struct {
RoomID string `json:"room_id"`
Subject string `json:"subject"`
Sender string `json:"sender"`
Text string `json:"text"`
Encrypted bool `json:"encrypted"`
TS int64 `json:"ts"` // unix millis
}
// roomInfo caches the per-room metadata a peer needs to label incoming frames.
type roomInfo struct {
subject string
encrypt bool
}
// peerState holds everything about one named peer: its bus client, its public
// endpoint, its live subscriptions, the rooms it knows, and the set of SSE
// listener channels currently attached to it.
type peerState struct {
name string
client *client.Client
endpoint client.Endpoint
mu sync.Mutex
subs map[string]*client.Sub // roomID -> subscription
rooms map[string]roomInfo // roomID -> subject/encrypt
listeners map[chan Event]struct{} // attached SSE channels
}
// emit fans an event out to all attached listeners without blocking on a slow
// or disconnected consumer.
func (p *peerState) emit(ev Event) {
p.mu.Lock()
defer p.mu.Unlock()
for ch := range p.listeners {
select {
case ch <- ev:
default: // listener buffer full: drop rather than block the NATS callback
}
}
}
func (p *peerState) addListener(ch chan Event) {
p.mu.Lock()
p.listeners[ch] = struct{}{}
p.mu.Unlock()
}
func (p *peerState) removeListener(ch chan Event) {
p.mu.Lock()
delete(p.listeners, ch)
p.mu.Unlock()
}
func (p *peerState) setRoom(roomID string, info roomInfo) {
p.mu.Lock()
p.rooms[roomID] = info
p.mu.Unlock()
}
// roomList returns a snapshot of the rooms this peer knows (created or joined),
// so the SPA can render the peer's room list without re-deriving it client-side.
func (p *peerState) roomList() []map[string]any {
p.mu.Lock()
defer p.mu.Unlock()
out := make([]map[string]any, 0, len(p.rooms))
for id, info := range p.rooms {
out = append(out, map[string]any{
"room_id": id,
"subject": info.subject,
"encrypt": info.encrypt,
})
}
return out
}
// ---------------------------------------------------------------------------
// Hub: the registry of peers, protected by a single mutex.
// ---------------------------------------------------------------------------
type Hub struct {
mu sync.Mutex
peers map[string]*peerState
}
func newHub() *Hub { return &Hub{peers: map[string]*peerState{}} }
// getOrCreate returns the peer for name, creating its identity + bus client on
// first use. Identities persist to playground/local_files/<name>.id so a peer
// keeps the same endpoint across restarts.
func (h *Hub) getOrCreate(name string) (*peerState, error) {
h.mu.Lock()
defer h.mu.Unlock()
if p, ok := h.peers[name]; ok {
return p, nil
}
idPath := filepath.Join(localFiles, name+".id")
id, err := client.LoadOrCreateIdentity(idPath)
if err != nil {
return nil, fmt.Errorf("identity for %q: %w", name, err)
}
c, err := client.New(natsURL, ctrlURL, id)
if err != nil {
return nil, fmt.Errorf("client for %q: %w", name, err)
}
p := &peerState{
name: name,
client: c,
endpoint: c.Endpoint(),
subs: map[string]*client.Sub{},
rooms: map[string]roomInfo{},
listeners: map[chan Event]struct{}{},
}
h.peers[name] = p
return p, nil
}
// lookup returns an already-created peer or false.
func (h *Hub) lookup(name string) (*peerState, bool) {
h.mu.Lock()
defer h.mu.Unlock()
p, ok := h.peers[name]
return p, ok
}
// list returns a snapshot of all peers (name + endpoint id).
func (h *Hub) list() []map[string]string {
h.mu.Lock()
defer h.mu.Unlock()
out := make([]map[string]string, 0, len(h.peers))
for name, p := range h.peers {
out = append(out, map[string]string{"name": name, "endpoint_id": p.endpoint.ID})
}
return out
}
func (h *Hub) closeAll() {
h.mu.Lock()
defer h.mu.Unlock()
for _, p := range h.peers {
p.mu.Lock()
for _, sub := range p.subs {
_ = sub.Unsubscribe()
}
p.mu.Unlock()
_ = p.client.Close()
}
}
// subscribeRoom subscribes the peer to a room (idempotent) and wires the frame
// handler to fan incoming messages out as Events. info labels each event with
// the room's subject and encryption flag.
func (p *peerState) subscribeRoom(roomID string, info roomInfo) error {
p.mu.Lock()
if _, already := p.subs[roomID]; already {
p.mu.Unlock()
return nil
}
p.mu.Unlock()
sub, err := p.client.Subscribe(roomID, func(f frame.Frame, plaintext []byte) {
p.emit(Event{
RoomID: roomID,
Subject: info.subject,
Sender: f.Sender,
Text: string(plaintext),
Encrypted: info.encrypt,
TS: time.Now().UnixMilli(),
})
})
if err != nil {
return fmt.Errorf("subscribe room %s: %w", roomID, err)
}
p.mu.Lock()
p.subs[roomID] = sub
p.mu.Unlock()
p.setRoom(roomID, info)
return nil
}
// ---------------------------------------------------------------------------
// Control-plane helper: fetch a room's subject + policy from membershipd. The
// client package keeps fetchRoom private, so the playground talks to the
// control plane directly (read endpoints are unauthenticated by design).
// ---------------------------------------------------------------------------
type ctrlRoomResp struct {
Subject string `json:"subject"`
Epoch int `json:"epoch"`
Policy struct {
Encrypt bool `json:"encrypt"`
Persist bool `json:"persist"`
SignMsgs bool `json:"sign_msgs"`
} `json:"policy"`
}
func fetchRoomInfo(roomID string) (roomInfo, error) {
resp, err := http.Get(ctrlURL + "/rooms/" + roomID)
if err != nil {
return roomInfo{}, fmt.Errorf("fetch room %s: %w", roomID, err)
}
defer resp.Body.Close()
if resp.StatusCode >= 300 {
return roomInfo{}, fmt.Errorf("room %s not found (status %d)", roomID, resp.StatusCode)
}
var r ctrlRoomResp
if err := json.NewDecoder(resp.Body).Decode(&r); err != nil {
return roomInfo{}, fmt.Errorf("decode room %s: %w", roomID, err)
}
return roomInfo{subject: r.Subject, encrypt: r.Policy.Encrypt}, nil
}
// ---------------------------------------------------------------------------
// HTTP handlers (web UI on :7700).
// ---------------------------------------------------------------------------
func writeJSON(w http.ResponseWriter, code int, v any) {
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(code)
_ = json.NewEncoder(w).Encode(v)
}
func writeErr(w http.ResponseWriter, code int, msg string) {
writeJSON(w, code, map[string]string{"error": msg})
}
func decodeBody(r *http.Request, out any) error {
defer r.Body.Close()
return json.NewDecoder(r.Body).Decode(out)
}
func (h *Hub) handleIndex(w http.ResponseWriter, r *http.Request) {
if r.URL.Path != "/" {
http.NotFound(w, r)
return
}
w.Header().Set("Content-Type", "text/html; charset=utf-8")
_, _ = w.Write(indexHTML)
}
func (h *Hub) handlePeer(w http.ResponseWriter, r *http.Request) {
var req struct {
Name string `json:"name"`
}
if err := decodeBody(r, &req); err != nil || req.Name == "" {
writeErr(w, http.StatusBadRequest, "name required")
return
}
p, err := h.getOrCreate(req.Name)
if err != nil {
writeErr(w, http.StatusInternalServerError, err.Error())
return
}
writeJSON(w, http.StatusOK, map[string]string{"name": p.name, "endpoint_id": p.endpoint.ID})
}
func (h *Hub) handlePeers(w http.ResponseWriter, r *http.Request) {
writeJSON(w, http.StatusOK, h.list())
}
func (h *Hub) handleRoom(w http.ResponseWriter, r *http.Request) {
var req struct {
Peer string `json:"peer"`
Subject string `json:"subject"`
Encrypt bool `json:"encrypt"`
Persist bool `json:"persist"`
}
if err := decodeBody(r, &req); err != nil || req.Peer == "" || req.Subject == "" {
writeErr(w, http.StatusBadRequest, "peer and subject required")
return
}
p, ok := h.lookup(req.Peer)
if !ok {
writeErr(w, http.StatusBadRequest, "unknown peer "+req.Peer)
return
}
// The two checkboxes map to an explicit per-room policy. encrypt drives both
// encryption and per-message signing; persist (default false) independently
// toggles durable JetStream history. persist=false keeps plain ephemeral NATS.
policy := room.Policy{Encrypt: req.Encrypt, Persist: req.Persist, SignMsgs: req.Encrypt}
roomID, err := p.client.CreateRoom(req.Subject, policy)
if err != nil {
writeErr(w, http.StatusInternalServerError, err.Error())
return
}
info := roomInfo{subject: req.Subject, encrypt: req.Encrypt}
if err := p.subscribeRoom(roomID, info); err != nil {
writeErr(w, http.StatusInternalServerError, err.Error())
return
}
writeJSON(w, http.StatusOK, map[string]any{
"room_id": roomID, "subject": req.Subject, "encrypt": req.Encrypt, "persist": req.Persist,
})
}
func (h *Hub) handleJoin(w http.ResponseWriter, r *http.Request) {
var req struct {
Peer string `json:"peer"`
RoomID string `json:"room_id"`
}
if err := decodeBody(r, &req); err != nil || req.Peer == "" || req.RoomID == "" {
writeErr(w, http.StatusBadRequest, "peer and room_id required")
return
}
p, ok := h.lookup(req.Peer)
if !ok {
writeErr(w, http.StatusBadRequest, "unknown peer "+req.Peer)
return
}
if err := p.client.Join(req.RoomID); err != nil {
writeErr(w, http.StatusBadRequest, "join failed: "+err.Error())
return
}
info, err := fetchRoomInfo(req.RoomID)
if err != nil {
writeErr(w, http.StatusInternalServerError, err.Error())
return
}
if err := p.subscribeRoom(req.RoomID, info); err != nil {
writeErr(w, http.StatusInternalServerError, err.Error())
return
}
writeJSON(w, http.StatusOK, map[string]any{"subject": info.subject, "encrypt": info.encrypt})
}
func (h *Hub) handleInvite(w http.ResponseWriter, r *http.Request) {
var req struct {
Peer string `json:"peer"`
RoomID string `json:"room_id"`
Target string `json:"target"`
}
if err := decodeBody(r, &req); err != nil || req.Peer == "" || req.RoomID == "" || req.Target == "" {
writeErr(w, http.StatusBadRequest, "peer, room_id and target required")
return
}
p, ok := h.lookup(req.Peer)
if !ok {
writeErr(w, http.StatusBadRequest, "unknown peer "+req.Peer)
return
}
target, ok := h.lookup(req.Target)
if !ok {
writeErr(w, http.StatusBadRequest, "target peer "+req.Target+" does not exist; connect it first")
return
}
if err := p.client.Invite(req.RoomID, target.endpoint); err != nil {
writeErr(w, http.StatusInternalServerError, err.Error())
return
}
writeJSON(w, http.StatusOK, map[string]string{"status": "invited", "target": req.Target})
}
func (h *Hub) handlePublish(w http.ResponseWriter, r *http.Request) {
var req struct {
Peer string `json:"peer"`
RoomID string `json:"room_id"`
Text string `json:"text"`
}
if err := decodeBody(r, &req); err != nil || req.Peer == "" || req.RoomID == "" {
writeErr(w, http.StatusBadRequest, "peer and room_id required")
return
}
p, ok := h.lookup(req.Peer)
if !ok {
writeErr(w, http.StatusBadRequest, "unknown peer "+req.Peer)
return
}
if err := p.client.Publish(req.RoomID, []byte(req.Text)); err != nil {
writeErr(w, http.StatusInternalServerError, err.Error())
return
}
writeJSON(w, http.StatusOK, map[string]string{"status": "published"})
}
func (h *Hub) handleKick(w http.ResponseWriter, r *http.Request) {
var req struct {
Peer string `json:"peer"`
RoomID string `json:"room_id"`
Target string `json:"target"`
}
if err := decodeBody(r, &req); err != nil || req.Peer == "" || req.RoomID == "" || req.Target == "" {
writeErr(w, http.StatusBadRequest, "peer, room_id and target required")
return
}
p, ok := h.lookup(req.Peer)
if !ok {
writeErr(w, http.StatusBadRequest, "unknown peer "+req.Peer)
return
}
target, ok := h.lookup(req.Target)
if !ok {
writeErr(w, http.StatusBadRequest, "target peer "+req.Target+" does not exist")
return
}
if err := p.client.Kick(req.RoomID, target.endpoint.ID); err != nil {
writeErr(w, http.StatusInternalServerError, err.Error())
return
}
writeJSON(w, http.StatusOK, map[string]string{"status": "kicked", "target": req.Target})
}
// handleRooms returns the rooms a peer knows (created or joined). The SPA polls
// or calls this after create/join to refresh its room list.
//
// GET /api/rooms?peer=ana
func (h *Hub) handleRooms(w http.ResponseWriter, r *http.Request) {
name := r.URL.Query().Get("peer")
if name == "" {
writeErr(w, http.StatusBadRequest, "peer query param required")
return
}
p, ok := h.lookup(name)
if !ok {
writeErr(w, http.StatusBadRequest, "unknown peer "+name)
return
}
writeJSON(w, http.StatusOK, p.roomList())
}
// handleMembers lists the members of a room (endpoint id + role) so the SPA can
// render a members panel and drive invite/kick. It proxies the control plane's
// unauthenticated read endpoint; the public keys it returns are not secret.
//
// GET /api/members?room_id=<id>
func (h *Hub) handleMembers(w http.ResponseWriter, r *http.Request) {
roomID := r.URL.Query().Get("room_id")
if roomID == "" {
writeErr(w, http.StatusBadRequest, "room_id query param required")
return
}
resp, err := http.Get(ctrlURL + "/rooms/" + roomID + "/members")
if err != nil {
writeErr(w, http.StatusInternalServerError, "fetch members: "+err.Error())
return
}
defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body)
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(resp.StatusCode)
_, _ = w.Write(body)
}
// withCORS allows the SPA running under the Vite dev server (a different origin)
// to call the gateway. It answers preflight OPTIONS and tags every response with
// permissive CORS headers. v1 trusts the local network, mirroring the control
// plane's auth model.
func withCORS(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Access-Control-Allow-Origin", "*")
w.Header().Set("Access-Control-Allow-Methods", "GET, POST, OPTIONS")
w.Header().Set("Access-Control-Allow-Headers", "Content-Type")
if r.Method == http.MethodOptions {
w.WriteHeader(http.StatusNoContent)
return
}
next.ServeHTTP(w, r)
})
}
// handleStream is the SSE endpoint. The browser opens one EventSource per peer;
// each received Event is emitted as a `data: <json>\n\n` block. The listener is
// cleaned up when the HTTP request context is cancelled (tab closed / reload).
func (h *Hub) handleStream(w http.ResponseWriter, r *http.Request) {
name := r.URL.Query().Get("peer")
if name == "" {
writeErr(w, http.StatusBadRequest, "peer query param required")
return
}
p, ok := h.lookup(name)
if !ok {
writeErr(w, http.StatusBadRequest, "unknown peer "+name)
return
}
flusher, ok := w.(http.Flusher)
if !ok {
writeErr(w, http.StatusInternalServerError, "streaming unsupported")
return
}
w.Header().Set("Content-Type", "text/event-stream")
w.Header().Set("Cache-Control", "no-cache")
w.Header().Set("Connection", "keep-alive")
ch := make(chan Event, 64)
p.addListener(ch)
defer p.removeListener(ch)
// Initial comment so the browser marks the stream open immediately.
fmt.Fprintf(w, ": connected to %s\n\n", name)
flusher.Flush()
ctx := r.Context()
ping := time.NewTicker(20 * time.Second)
defer ping.Stop()
for {
select {
case <-ctx.Done():
return
case <-ping.C:
fmt.Fprintf(w, ": ping\n\n")
flusher.Flush()
case ev := <-ch:
b, err := json.Marshal(ev)
if err != nil {
continue
}
fmt.Fprintf(w, "data: %s\n\n", b)
flusher.Flush()
}
}
}
// ---------------------------------------------------------------------------
// Benchmark: one publisher floods a room with thousands of messages that N
// subscribers receive. The two policy axes are exposed as independent flags:
// encrypt (AEAD payload + Ed25519 per-message signature) and persist (durable
// JetStream history vs ephemeral core NATS). Payload size is configurable. The
// benchmark uses its own ephemeral peers (not the hub's named peers) so it never
// interferes with the manual sandbox, and streams progress samples over SSE so
// the browser can animate a live throughput chart.
// ---------------------------------------------------------------------------
// benchSample is one Server-Sent Event of a running benchmark.
type benchSample struct {
Type string `json:"type"` // "start" | "sample" | "done" | "error"
T float64 `json:"t"`
Sent int64 `json:"sent"`
Recv int64 `json:"recv"`
NMsgs int `json:"n_msgs,omitempty"`
NSubs int `json:"n_subs,omitempty"`
Payload int `json:"payload,omitempty"`
Encrypt bool `json:"encrypt,omitempty"`
Persist bool `json:"persist,omitempty"`
Capped bool `json:"capped,omitempty"`
PubTps int64 `json:"pub_tps,omitempty"`
RecvTps int64 `json:"recv_tps,omitempty"`
PerSub []int64 `json:"per_sub,omitempty"`
Msg string `json:"msg,omitempty"`
}
// runBench wires up one publisher + nSubs subscribers, publishes nMsgs payloads,
// and calls emit periodically with the running totals. emit is only ever called
// from the calling goroutine (the SSE handler), so it needs no locking.
func runBench(ctx context.Context, emit func(benchSample), nMsgs, nSubs, payloadBytes int, encrypt, persist bool) {
policy := room.Policy{Encrypt: encrypt, Persist: persist, SignMsgs: encrypt}
subject := fmt.Sprintf("bench.%d", time.Now().UnixNano())
newPeer := func() (*client.Client, error) {
id, err := cs.GenerateIdentity()
if err != nil {
return nil, err
}
return client.New(natsURL, ctrlURL, id)
}
pub, err := newPeer()
if err != nil {
emit(benchSample{Type: "error", Msg: "publisher: " + err.Error()})
return
}
defer pub.Close()
roomID, err := pub.CreateRoom(subject, policy)
if err != nil {
emit(benchSample{Type: "error", Msg: "create room: " + err.Error()})
return
}
counters := make([]int64, nSubs)
subClients := make([]*client.Client, 0, nSubs)
defer func() {
for _, c := range subClients {
_ = c.Close()
}
}()
// One room, N subscribers. For encrypted rooms each subscriber must be invited
// (sealed key) and join before subscribing; for cleartext rooms Subscribe on
// the shared roomID is enough.
for i := 0; i < nSubs; i++ {
c, err := newPeer()
if err != nil {
emit(benchSample{Type: "error", Msg: fmt.Sprintf("subscriber %d: %v", i, err)})
return
}
subClients = append(subClients, c)
if encrypt {
if err := pub.Invite(roomID, c.Endpoint()); err != nil {
emit(benchSample{Type: "error", Msg: fmt.Sprintf("invite %d: %v", i, err)})
return
}
if err := c.Join(roomID); err != nil {
emit(benchSample{Type: "error", Msg: fmt.Sprintf("join %d: %v", i, err)})
return
}
}
idx := i
if _, err := c.Subscribe(roomID, func(_ frame.Frame, _ []byte) {
atomic.AddInt64(&counters[idx], 1)
}); err != nil {
emit(benchSample{Type: "error", Msg: fmt.Sprintf("subscribe %d: %v", i, err)})
return
}
}
sumRecv := func() int64 {
var s int64
for i := range counters {
s += atomic.LoadInt64(&counters[i])
}
return s
}
payload := bytes.Repeat([]byte{'x'}, payloadBytes)
var sent int64
emit(benchSample{Type: "start", NMsgs: nMsgs, NSubs: nSubs, Payload: payloadBytes, Encrypt: encrypt, Persist: persist})
t0 := time.Now()
done := make(chan struct{})
var pubErr atomic.Value
go func() {
defer close(done)
for k := 0; k < nMsgs; k++ {
if err := pub.Publish(roomID, payload); err != nil {
pubErr.Store(err)
return
}
atomic.AddInt64(&sent, 1)
if k%256 == 0 {
select {
case <-ctx.Done():
return
default:
}
}
}
}()
ticker := time.NewTicker(60 * time.Millisecond)
defer ticker.Stop()
deadline := time.After(120 * time.Second)
target := int64(nMsgs) * int64(nSubs)
sampleLoop:
for {
select {
case <-ctx.Done():
return
case <-deadline:
break sampleLoop
case <-done:
break sampleLoop
case <-ticker.C:
emit(benchSample{Type: "sample", T: time.Since(t0).Seconds(), Sent: atomic.LoadInt64(&sent), Recv: sumRecv()})
}
}
if v := pubErr.Load(); v != nil {
emit(benchSample{Type: "error", Msg: "publish: " + v.(error).Error()})
return
}
// Final drain: keep sampling until every subscriber has caught up (or we give up).
for i := 0; i < 240; i++ {
if sumRecv() >= target {
break
}
select {
case <-ctx.Done():
return
case <-time.After(25 * time.Millisecond):
}
emit(benchSample{Type: "sample", T: time.Since(t0).Seconds(), Sent: atomic.LoadInt64(&sent), Recv: sumRecv()})
}
dur := time.Since(t0).Seconds()
finalSent := atomic.LoadInt64(&sent)
finalRecv := sumRecv()
per := make([]int64, nSubs)
for i := range counters {
per[i] = atomic.LoadInt64(&counters[i])
}
var pubTps, recvTps int64
if dur > 0 {
pubTps = int64(float64(finalSent) / dur)
recvTps = int64(float64(finalRecv) / dur)
}
emit(benchSample{Type: "done", T: dur, Sent: finalSent, Recv: finalRecv, PerSub: per, PubTps: pubTps, RecvTps: recvTps, NSubs: nSubs})
}
// handleBench is the SSE endpoint that drives a benchmark from query params:
//
// GET /api/bench?n_msgs=20000&n_subs=3&payload=128&encrypt=0&persist=0
//
// Encrypted/persistent runs are capped to a lower message count (the per-message
// crypto + JetStream ack make them far slower); the cap is reported in the start
// sample so the UI can show it.
func (h *Hub) handleBench(w http.ResponseWriter, r *http.Request) {
q := r.URL.Query()
atoiDef := func(k string, def int) int {
if v, err := strconv.Atoi(q.Get(k)); err == nil {
return v
}
return def
}
truthy := func(k string) bool { v := q.Get(k); return v == "1" || v == "true" }
nMsgs := atoiDef("n_msgs", 20000)
nSubs := atoiDef("n_subs", 3)
payload := atoiDef("payload", 128)
encrypt := truthy("encrypt")
persist := truthy("persist")
if nSubs < 1 {
nSubs = 1
} else if nSubs > 16 {
nSubs = 16
}
if payload < 1 {
payload = 1
} else if payload > 8192 {
payload = 8192
}
if nMsgs < 100 {
nMsgs = 100
}
maxMsgs := 200000
if encrypt || persist {
maxMsgs = 30000 // crypto + JetStream ack are much slower; keep the run bounded
}
capped := false
if nMsgs > maxMsgs {
nMsgs, capped = maxMsgs, true
}
flusher, ok := w.(http.Flusher)
if !ok {
writeErr(w, http.StatusInternalServerError, "streaming unsupported")
return
}
w.Header().Set("Content-Type", "text/event-stream")
w.Header().Set("Cache-Control", "no-cache")
w.Header().Set("Connection", "keep-alive")
fmt.Fprintf(w, ": bench start\n\n")
flusher.Flush()
emit := func(s benchSample) {
if s.Type == "start" {
s.Capped = capped
}
b, err := json.Marshal(s)
if err != nil {
return
}
fmt.Fprintf(w, "data: %s\n\n", b)
flusher.Flush()
}
runBench(r.Context(), emit, nMsgs, nSubs, payload, encrypt, persist)
fmt.Fprintf(w, "event: end\ndata: {}\n\n")
flusher.Flush()
}
// ---------------------------------------------------------------------------
// main: bring up NATS, control plane, and the web server; tear them all down
// cleanly on signal.
// ---------------------------------------------------------------------------
func main() {
log.SetFlags(log.LstdFlags | log.Lmsgprefix)
log.SetPrefix("[playground] ")
if err := os.MkdirAll(localFiles, 0o755); err != nil {
log.Fatalf("mkdir %s: %v", localFiles, err)
}
// 1. Data plane: embedded NATS + JetStream on the fixed internal port.
ns, err := embeddednats.Start(filepath.Join(localFiles, "js"), natsPort)
if err != nil {
log.Fatalf("start embedded nats: %v", err)
}
log.Printf("embedded NATS (JetStream) ready: %s", embeddednats.ClientURL(ns))
// 2. Control plane: membership store + blob store + internal HTTP server.
store, err := membership.Open(filepath.Join(localFiles, "play.db"))
if err != nil {
ns.Shutdown()
log.Fatalf("open membership store: %v", err)
}
blobs, err := blobstore.New(filepath.Join(localFiles, "blobs"))
if err != nil {
store.Close()
ns.Shutdown()
log.Fatalf("open blob store: %v", err)
}
// AuthOff: the playground is a local dev gateway that has not migrated to
// signed control-plane requests or a secured upstream bus yet. What it would
// need is written up in dev/0001e-remaining-clients.md (issue 0001, phase 0001e).
ctrlSrv := &http.Server{Addr: ctrlAddr, Handler: membership.NewServer(store, blobs, membership.AuthOff)}
go func() {
if err := ctrlSrv.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) {
log.Fatalf("control plane: %v", err)
}
}()
if err := waitHealthy(ctrlURL+"/healthz", 5*time.Second); err != nil {
log.Fatalf("control plane not healthy: %v", err)
}
log.Printf("control plane ready: %s", ctrlURL)
// 3. Web UI on :7700.
hub := newHub()
mux := http.NewServeMux()
mux.HandleFunc("/", hub.handleIndex)
mux.HandleFunc("POST /api/peer", hub.handlePeer)
mux.HandleFunc("GET /api/peers", hub.handlePeers)
mux.HandleFunc("POST /api/room", hub.handleRoom)
mux.HandleFunc("POST /api/join", hub.handleJoin)
mux.HandleFunc("POST /api/invite", hub.handleInvite)
mux.HandleFunc("POST /api/publish", hub.handlePublish)
mux.HandleFunc("POST /api/kick", hub.handleKick)
mux.HandleFunc("GET /api/rooms", hub.handleRooms)
mux.HandleFunc("GET /api/members", hub.handleMembers)
mux.HandleFunc("GET /api/stream", hub.handleStream)
mux.HandleFunc("GET /api/bench", hub.handleBench)
webSrv := &http.Server{Addr: webAddr, Handler: withCORS(mux)}
go func() {
if err := webSrv.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) {
log.Fatalf("web server: %v", err)
}
}()
log.Printf("web UI ready: http://%s", webAddr)
log.Printf("open http://localhost:7700 in two browser tabs to try the bus")
// 4. Graceful shutdown.
stop := make(chan os.Signal, 1)
signal.Notify(stop, syscall.SIGINT, syscall.SIGTERM)
<-stop
log.Printf("shutting down...")
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
_ = webSrv.Shutdown(ctx)
hub.closeAll()
_ = ctrlSrv.Shutdown(ctx)
store.Close()
ns.Shutdown()
ns.WaitForShutdown()
log.Printf("bye")
}
// waitHealthy polls url until it returns a 2xx/3xx or the deadline elapses.
func waitHealthy(url string, timeout time.Duration) error {
deadline := time.Now().Add(timeout)
c := &http.Client{Timeout: 500 * time.Millisecond}
for time.Now().Before(deadline) {
resp, err := c.Get(url)
if err == nil {
resp.Body.Close()
if resp.StatusCode < 400 {
return nil
}
}
time.Sleep(100 * time.Millisecond)
}
return fmt.Errorf("timeout waiting for %s", url)
}
+1 -1
View File
@@ -3,7 +3,7 @@
<head> <head>
<meta charset="UTF-8" /> <meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>unibus · chat</title> <title>unibus</title>
</head> </head>
<body> <body>
<div id="root"></div> <div id="root"></div>
+1 -2
View File
@@ -3,7 +3,6 @@
"private": true, "private": true,
"version": "0.1.0", "version": "0.1.0",
"type": "module", "type": "module",
"description": "SPA de chat para el bus unibus (rooms cifradas E2E, mensajes en vivo por SSE).",
"scripts": { "scripts": {
"dev": "vite", "dev": "vite",
"build": "tsc -b && vite build", "build": "tsc -b && vite build",
@@ -23,7 +22,7 @@
"postcss": "^8.4.49", "postcss": "^8.4.49",
"postcss-preset-mantine": "^1.17.0", "postcss-preset-mantine": "^1.17.0",
"postcss-simple-vars": "^7.0.1", "postcss-simple-vars": "^7.0.1",
"typescript": "^5.6.3", "typescript": "~5.6.3",
"vite": "^6.0.3" "vite": "^6.0.3"
} }
} }
+38 -38
View File
@@ -10,7 +10,7 @@ importers:
dependencies: dependencies:
'@mantine/core': '@mantine/core':
specifier: ^9.3.0 specifier: ^9.3.0
version: 9.3.0(@mantine/hooks@9.3.0(react@19.2.7))(@types/react@19.2.16)(react-dom@19.2.7(react@19.2.7))(react@19.2.7) version: 9.3.0(@mantine/hooks@9.3.0(react@19.2.7))(@types/react@19.2.17)(react-dom@19.2.7(react@19.2.7))(react@19.2.7)
'@mantine/hooks': '@mantine/hooks':
specifier: ^9.3.0 specifier: ^9.3.0
version: 9.3.0(react@19.2.7) version: 9.3.0(react@19.2.7)
@@ -26,10 +26,10 @@ importers:
devDependencies: devDependencies:
'@types/react': '@types/react':
specifier: ^19.2.0 specifier: ^19.2.0
version: 19.2.16 version: 19.2.17
'@types/react-dom': '@types/react-dom':
specifier: ^19.2.0 specifier: ^19.2.0
version: 19.2.3(@types/react@19.2.16) version: 19.2.3(@types/react@19.2.17)
'@vitejs/plugin-react': '@vitejs/plugin-react':
specifier: ^4.3.4 specifier: ^4.3.4
version: 4.7.0(vite@6.4.3(sugarss@5.0.1(postcss@8.5.15))) version: 4.7.0(vite@6.4.3(sugarss@5.0.1(postcss@8.5.15)))
@@ -43,8 +43,8 @@ importers:
specifier: ^7.0.1 specifier: ^7.0.1
version: 7.0.1(postcss@8.5.15) version: 7.0.1(postcss@8.5.15)
typescript: typescript:
specifier: ^5.6.3 specifier: ~5.6.3
version: 5.9.3 version: 5.6.3
vite: vite:
specifier: ^6.0.3 specifier: ^6.0.3
version: 6.4.3(sugarss@5.0.1(postcss@8.5.15)) version: 6.4.3(sugarss@5.0.1(postcss@8.5.15))
@@ -508,8 +508,8 @@ packages:
peerDependencies: peerDependencies:
'@types/react': ^19.2.0 '@types/react': ^19.2.0
'@types/react@19.2.16': '@types/react@19.2.17':
resolution: {integrity: sha512-esJiCAnl0kfpNdE69f3So4WJUXy95dLZydX0KwK46riIHDzHM7O9Vtf9xCHW0PXIqvgqNrswl522kA/5yx+F4w==} resolution: {integrity: sha512-MXfmqaVPEVgkBT/aY0aGCkRWWtByiYQXo3xdQ8r5RzuFrPiRn8Gar2tQdXSUQ2GKV3bkXckek89V8wQBY2Q/Aw==}
'@vitejs/plugin-react@4.7.0': '@vitejs/plugin-react@4.7.0':
resolution: {integrity: sha512-gUu9hwfWvvEDBBmgtAowQCojwZmJ5mcLn3aufeCsitijs3+f2NsrPtlAWIR6OPiqljl96GVCUbLe0HyqIpVaoA==} resolution: {integrity: sha512-gUu9hwfWvvEDBBmgtAowQCojwZmJ5mcLn3aufeCsitijs3+f2NsrPtlAWIR6OPiqljl96GVCUbLe0HyqIpVaoA==}
@@ -517,8 +517,8 @@ packages:
peerDependencies: peerDependencies:
vite: ^4.2.0 || ^5.0.0 || ^6.0.0 || ^7.0.0 vite: ^4.2.0 || ^5.0.0 || ^6.0.0 || ^7.0.0
baseline-browser-mapping@2.10.33: baseline-browser-mapping@2.10.34:
resolution: {integrity: sha512-bA6+tcSLpz2tIEdDXZPpPTIuxBcC4+w6SieaYyfigIa4h8GlFxbA17v22Vx3JUtuZQj9SgOsnbK+aTBzyDyEuw==} resolution: {integrity: sha512-IMDedajPifLnHNY0X9n8hKxRTQ6/eTHwr5bDo04WnuqxyKw6LYtQywCuuqPZwhl3aBXMvQpJov42GLCwRRdQzw==}
engines: {node: '>=6.0.0'} engines: {node: '>=6.0.0'}
hasBin: true hasBin: true
@@ -531,8 +531,8 @@ packages:
resolution: {integrity: sha512-QOSvevhslijgYwRx6Rv7zKdMF8lbRmx+uQGx2+vDc+KI/eBnsy9kit5aj23AgGu3pa4t9AgwbnXWqS+iOY+2aA==} resolution: {integrity: sha512-QOSvevhslijgYwRx6Rv7zKdMF8lbRmx+uQGx2+vDc+KI/eBnsy9kit5aj23AgGu3pa4t9AgwbnXWqS+iOY+2aA==}
engines: {node: '>= 6'} engines: {node: '>= 6'}
caniuse-lite@1.0.30001793: caniuse-lite@1.0.30001797:
resolution: {integrity: sha512-iwSsYWaCOoh26cV8NwNRViHlrfUvYsHDfRVcbtmw0Kg6PJIZZXwMkj1442FYLBGkeUf1juAsU3DTfxW579mrPA==} resolution: {integrity: sha512-l8xKG+gwAIExZGl9FrF7KUwuOmk6wbEPC9Xoy/RtnWv1XG0Q4LFlagaLpUv3Kiza3W/wm27zy0yWJEieYKAP6w==}
clsx@2.1.1: clsx@2.1.1:
resolution: {integrity: sha512-eYm0QWBtUrBWZWG0d386OGAw16Z995PiOVo2B7bjWSbHedGl5e0ZWaq65kOGgUSNesEIDkB9ISbTg/JK9dhCZA==} resolution: {integrity: sha512-eYm0QWBtUrBWZWG0d386OGAw16Z995PiOVo2B7bjWSbHedGl5e0ZWaq65kOGgUSNesEIDkB9ISbTg/JK9dhCZA==}
@@ -756,8 +756,8 @@ packages:
resolution: {integrity: sha512-1URUxUqfHFM1c+zfSPsa3gnkO7Aq21qyH75SIduNYz4SzY964rn1X2vCMQaHSHhktiw+0kPa2iyb6PUpXqB6Vg==} resolution: {integrity: sha512-1URUxUqfHFM1c+zfSPsa3gnkO7Aq21qyH75SIduNYz4SzY964rn1X2vCMQaHSHhktiw+0kPa2iyb6PUpXqB6Vg==}
engines: {node: '>=20'} engines: {node: '>=20'}
typescript@5.9.3: typescript@5.6.3:
resolution: {integrity: sha512-jl1vZzPDinLr9eUt3J/t7V6FgNEw9QjvBPdysz9KfQDD41fQrC2Y4vKQdiaUpFT4bXlb1RHhLpp8wtm6M5TgSw==} resolution: {integrity: sha512-hjcS1mhfuyi4WW8IWtjP7brDrG2cuDZukyrYrSauoXGNgx0S7zceP07adYkJycEr56BOUTNPzbInooiN3fn1qw==}
engines: {node: '>=14.17'} engines: {node: '>=14.17'}
hasBin: true hasBin: true
@@ -1069,7 +1069,7 @@ snapshots:
'@jridgewell/resolve-uri': 3.1.2 '@jridgewell/resolve-uri': 3.1.2
'@jridgewell/sourcemap-codec': 1.5.5 '@jridgewell/sourcemap-codec': 1.5.5
'@mantine/core@9.3.0(@mantine/hooks@9.3.0(react@19.2.7))(@types/react@19.2.16)(react-dom@19.2.7(react@19.2.7))(react@19.2.7)': '@mantine/core@9.3.0(@mantine/hooks@9.3.0(react@19.2.7))(@types/react@19.2.17)(react-dom@19.2.7(react@19.2.7))(react@19.2.7)':
dependencies: dependencies:
'@floating-ui/react': 0.27.19(react-dom@19.2.7(react@19.2.7))(react@19.2.7) '@floating-ui/react': 0.27.19(react-dom@19.2.7(react@19.2.7))(react@19.2.7)
'@mantine/hooks': 9.3.0(react@19.2.7) '@mantine/hooks': 9.3.0(react@19.2.7)
@@ -1077,7 +1077,7 @@ snapshots:
react: 19.2.7 react: 19.2.7
react-dom: 19.2.7(react@19.2.7) react-dom: 19.2.7(react@19.2.7)
react-number-format: 5.4.5(react-dom@19.2.7(react@19.2.7))(react@19.2.7) react-number-format: 5.4.5(react-dom@19.2.7(react@19.2.7))(react@19.2.7)
react-remove-scroll: 2.7.2(@types/react@19.2.16)(react@19.2.7) react-remove-scroll: 2.7.2(@types/react@19.2.17)(react@19.2.7)
type-fest: 5.7.0 type-fest: 5.7.0
transitivePeerDependencies: transitivePeerDependencies:
- '@types/react' - '@types/react'
@@ -1193,11 +1193,11 @@ snapshots:
'@types/estree@1.0.9': {} '@types/estree@1.0.9': {}
'@types/react-dom@19.2.3(@types/react@19.2.16)': '@types/react-dom@19.2.3(@types/react@19.2.17)':
dependencies: dependencies:
'@types/react': 19.2.16 '@types/react': 19.2.17
'@types/react@19.2.16': '@types/react@19.2.17':
dependencies: dependencies:
csstype: 3.2.3 csstype: 3.2.3
@@ -1213,19 +1213,19 @@ snapshots:
transitivePeerDependencies: transitivePeerDependencies:
- supports-color - supports-color
baseline-browser-mapping@2.10.33: {} baseline-browser-mapping@2.10.34: {}
browserslist@4.28.2: browserslist@4.28.2:
dependencies: dependencies:
baseline-browser-mapping: 2.10.33 baseline-browser-mapping: 2.10.34
caniuse-lite: 1.0.30001793 caniuse-lite: 1.0.30001797
electron-to-chromium: 1.5.368 electron-to-chromium: 1.5.368
node-releases: 2.0.47 node-releases: 2.0.47
update-browserslist-db: 1.2.3(browserslist@4.28.2) update-browserslist-db: 1.2.3(browserslist@4.28.2)
camelcase-css@2.0.1: {} camelcase-css@2.0.1: {}
caniuse-lite@1.0.30001793: {} caniuse-lite@1.0.30001797: {}
clsx@2.1.1: {} clsx@2.1.1: {}
@@ -1356,32 +1356,32 @@ snapshots:
react-refresh@0.17.0: {} react-refresh@0.17.0: {}
react-remove-scroll-bar@2.3.8(@types/react@19.2.16)(react@19.2.7): react-remove-scroll-bar@2.3.8(@types/react@19.2.17)(react@19.2.7):
dependencies: dependencies:
react: 19.2.7 react: 19.2.7
react-style-singleton: 2.2.3(@types/react@19.2.16)(react@19.2.7) react-style-singleton: 2.2.3(@types/react@19.2.17)(react@19.2.7)
tslib: 2.8.1 tslib: 2.8.1
optionalDependencies: optionalDependencies:
'@types/react': 19.2.16 '@types/react': 19.2.17
react-remove-scroll@2.7.2(@types/react@19.2.16)(react@19.2.7): react-remove-scroll@2.7.2(@types/react@19.2.17)(react@19.2.7):
dependencies: dependencies:
react: 19.2.7 react: 19.2.7
react-remove-scroll-bar: 2.3.8(@types/react@19.2.16)(react@19.2.7) react-remove-scroll-bar: 2.3.8(@types/react@19.2.17)(react@19.2.7)
react-style-singleton: 2.2.3(@types/react@19.2.16)(react@19.2.7) react-style-singleton: 2.2.3(@types/react@19.2.17)(react@19.2.7)
tslib: 2.8.1 tslib: 2.8.1
use-callback-ref: 1.3.3(@types/react@19.2.16)(react@19.2.7) use-callback-ref: 1.3.3(@types/react@19.2.17)(react@19.2.7)
use-sidecar: 1.1.3(@types/react@19.2.16)(react@19.2.7) use-sidecar: 1.1.3(@types/react@19.2.17)(react@19.2.7)
optionalDependencies: optionalDependencies:
'@types/react': 19.2.16 '@types/react': 19.2.17
react-style-singleton@2.2.3(@types/react@19.2.16)(react@19.2.7): react-style-singleton@2.2.3(@types/react@19.2.17)(react@19.2.7):
dependencies: dependencies:
get-nonce: 1.0.1 get-nonce: 1.0.1
react: 19.2.7 react: 19.2.7
tslib: 2.8.1 tslib: 2.8.1
optionalDependencies: optionalDependencies:
'@types/react': 19.2.16 '@types/react': 19.2.17
react@19.2.7: {} react@19.2.7: {}
@@ -1441,7 +1441,7 @@ snapshots:
dependencies: dependencies:
tagged-tag: 1.0.0 tagged-tag: 1.0.0
typescript@5.9.3: {} typescript@5.6.3: {}
update-browserslist-db@1.2.3(browserslist@4.28.2): update-browserslist-db@1.2.3(browserslist@4.28.2):
dependencies: dependencies:
@@ -1449,20 +1449,20 @@ snapshots:
escalade: 3.2.0 escalade: 3.2.0
picocolors: 1.1.1 picocolors: 1.1.1
use-callback-ref@1.3.3(@types/react@19.2.16)(react@19.2.7): use-callback-ref@1.3.3(@types/react@19.2.17)(react@19.2.7):
dependencies: dependencies:
react: 19.2.7 react: 19.2.7
tslib: 2.8.1 tslib: 2.8.1
optionalDependencies: optionalDependencies:
'@types/react': 19.2.16 '@types/react': 19.2.17
use-sidecar@1.1.3(@types/react@19.2.16)(react@19.2.7): use-sidecar@1.1.3(@types/react@19.2.17)(react@19.2.7):
dependencies: dependencies:
detect-node-es: 1.1.0 detect-node-es: 1.1.0
react: 19.2.7 react: 19.2.7
tslib: 2.8.1 tslib: 2.8.1
optionalDependencies: optionalDependencies:
'@types/react': 19.2.16 '@types/react': 19.2.17
util-deprecate@1.0.2: {} util-deprecate@1.0.2: {}
+6 -24
View File
@@ -1,29 +1,11 @@
import { useState } from "react"; import { useState } from "react";
import { GatewayClient } from "./api"; import { Login } from "./Login";
import type { Peer } from "./types"; import { ChatShell } from "./ChatShell";
import { ConnectScreen } from "./components/ConnectScreen"; import type { User } from "./types";
import { ChatLayout } from "./components/ChatLayout";
// Connection holds the live gateway client plus the identity it connected as.
interface Connection {
client: GatewayClient;
peer: Peer;
}
// App is the root: it shows the connect screen until the user picks a gateway
// URL and a peer name, then swaps to the full chat layout. Disconnecting drops
// back to the connect screen.
export function App() { export function App() {
const [conn, setConn] = useState<Connection | null>(null); const [user, setUser] = useState<User | null>(null);
if (!conn) { if (!user) return <Login onLogin={setUser} />;
return <ConnectScreen onConnect={(client, peer) => setConn({ client, peer })} />; return <ChatShell user={user} onLogout={() => setUser(null)} />;
}
return (
<ChatLayout
client={conn.client}
peer={conn.peer}
onDisconnect={() => setConn(null)}
/>
);
} }
+161
View File
@@ -0,0 +1,161 @@
import { useEffect, useRef, useState } from "react";
import {
ActionIcon,
Avatar,
Box,
Center,
Divider,
Group,
ScrollArea,
Stack,
Text,
TextInput,
Tooltip,
} from "@mantine/core";
import {
IconSend,
IconLock,
IconHash,
IconDotsVertical,
IconPaperclip,
} from "@tabler/icons-react";
import type { Message, Room, User } from "./types";
function initials(s: string) {
return s.replace(/[^a-z0-9]/gi, "").slice(0, 2).toUpperCase() || "?";
}
function timeShort(ts: number) {
const d = new Date(ts);
return `${String(d.getHours()).padStart(2, "0")}:${String(
d.getMinutes(),
).padStart(2, "0")}`;
}
function MessageRow({ msg }: { msg: Message }) {
return (
<Group align="flex-start" gap="sm" wrap="nowrap">
<Avatar radius="xl" size={36} color={msg.mine ? "brand" : "gray"}>
{initials(msg.sender)}
</Avatar>
<Box style={{ minWidth: 0 }}>
<Group gap={8} align="baseline">
<Text size="sm" fw={600} c={msg.mine ? "brand.4" : undefined}>
{msg.sender}
</Text>
<Text size="xs" c="dimmed">
{timeShort(msg.ts)}
</Text>
</Group>
<Text size="sm" style={{ wordBreak: "break-word" }}>
{msg.body}
</Text>
</Box>
</Group>
);
}
export function ChatPanel({
room,
user,
}: {
room: Room | undefined;
user: User;
}) {
const [draft, setDraft] = useState("");
const [extra, setExtra] = useState<Record<string, Message[]>>({});
const viewport = useRef<HTMLDivElement>(null);
const msgs = room ? [...room.messages, ...(extra[room.id] ?? [])] : [];
useEffect(() => {
viewport.current?.scrollTo({ top: viewport.current.scrollHeight });
}, [room?.id, msgs.length]);
if (!room) {
return (
<Center h="100%">
<Text c="dimmed">Selecciona una conversación</Text>
</Center>
);
}
const send = () => {
const body = draft.trim();
if (!body) return;
const msg: Message = {
id: `local-${Date.now()}`,
sender: user.handle,
body,
ts: Date.now(),
mine: true,
};
setExtra((e) => ({ ...e, [room.id]: [...(e[room.id] ?? []), msg] }));
setDraft("");
};
return (
<Stack h="100vh" gap={0}>
<Group justify="space-between" px="md" py="xs" wrap="nowrap">
<Group gap="sm" wrap="nowrap" style={{ minWidth: 0 }}>
<Avatar radius="md" size={38} color="brand">
{initials(room.name)}
</Avatar>
<Box style={{ minWidth: 0 }}>
<Group gap={6} wrap="nowrap">
<Text fw={650} truncate>
{room.name}
</Text>
{room.encrypted ? (
<Tooltip label="Cifrada de extremo a extremo">
<IconLock size={14} style={{ opacity: 0.6 }} />
</Tooltip>
) : (
<IconHash size={14} style={{ opacity: 0.6 }} />
)}
</Group>
<Text size="xs" c="dimmed">
{room.encrypted ? "cifrada · E2E" : "abierta · cleartext"}
</Text>
</Box>
</Group>
<ActionIcon variant="subtle" color="gray">
<IconDotsVertical size={18} />
</ActionIcon>
</Group>
<Divider color="dark.4" />
<ScrollArea style={{ flex: 1 }} viewportRef={viewport}>
<Stack gap="lg" p="md">
{msgs.map((m) => (
<MessageRow key={m.id} msg={m} />
))}
</Stack>
</ScrollArea>
<Divider color="dark.4" />
<Group p="sm" gap="xs" wrap="nowrap">
<ActionIcon variant="subtle" color="gray" size="lg">
<IconPaperclip size={18} />
</ActionIcon>
<TextInput
style={{ flex: 1 }}
radius="xl"
placeholder={`Mensaje a ${room.name}`}
value={draft}
onChange={(e) => setDraft(e.currentTarget.value)}
onKeyDown={(e) => e.key === "Enter" && send()}
/>
<ActionIcon
size="lg"
radius="xl"
variant="filled"
color="brand"
onClick={send}
disabled={!draft.trim()}
>
<IconSend size={18} />
</ActionIcon>
</Group>
</Stack>
);
}
+43
View File
@@ -0,0 +1,43 @@
import { useState } from "react";
import { Flex, Box } from "@mantine/core";
import { Sidebar } from "./Sidebar";
import { ChatPanel } from "./ChatPanel";
import { MOCK_ROOMS } from "./mock";
import type { User } from "./types";
export function ChatShell({
user,
onLogout,
}: {
user: User;
onLogout: () => void;
}) {
const [rooms] = useState(MOCK_ROOMS);
const [activeId, setActiveId] = useState<string>(rooms[0]?.id ?? "");
const active = rooms.find((r) => r.id === activeId);
return (
<Flex h="100vh" w="100vw" style={{ overflow: "hidden" }}>
<Box
w={320}
h="100%"
bg="dark.8"
style={{
borderRight: "1px solid var(--mantine-color-dark-4)",
flexShrink: 0,
}}
>
<Sidebar
user={user}
rooms={rooms}
activeId={activeId}
onSelect={setActiveId}
onLogout={onLogout}
/>
</Box>
<Box flex={1} h="100%" bg="dark.7" style={{ minWidth: 0 }}>
<ChatPanel room={active} user={user} />
</Box>
</Flex>
);
}
+64
View File
@@ -0,0 +1,64 @@
import { useState } from "react";
import {
Button,
Card,
Center,
PasswordInput,
Stack,
Text,
TextInput,
ThemeIcon,
Title,
} from "@mantine/core";
import { IconShieldLock, IconKey } from "@tabler/icons-react";
import type { User } from "./types";
export function Login({ onLogin }: { onLogin: (u: User) => void }) {
const [handle, setHandle] = useState("");
const [password, setPassword] = useState("");
const ready = handle.trim().length > 0 && password.length > 0;
const connect = () => {
const h = handle.trim();
if (ready) onLogin({ id: h, handle: h });
};
return (
<Center h="100vh" bg="dark.9">
<Card w={380} p="xl" radius="lg" withBorder bg="dark.7">
<Stack align="center" gap="lg">
<ThemeIcon size={60} radius="xl" variant="light" color="brand">
<IconShieldLock size={32} />
</ThemeIcon>
<Stack gap={2} align="center">
<Title order={2}>unibus</Title>
<Text c="dimmed" size="sm">
Mensajería cifrada de extremo a extremo
</Text>
</Stack>
<TextInput
w="100%"
label="Identidad"
placeholder="tu-handle"
value={handle}
onChange={(e) => setHandle(e.currentTarget.value)}
onKeyDown={(e) => e.key === "Enter" && connect()}
data-autofocus
/>
<PasswordInput
w="100%"
label="Contraseña"
description="Desbloquea tu identidad cifrada en este dispositivo"
placeholder="••••••••"
leftSection={<IconKey size={16} />}
value={password}
onChange={(e) => setPassword(e.currentTarget.value)}
onKeyDown={(e) => e.key === "Enter" && connect()}
/>
<Button w="100%" size="md" onClick={connect} disabled={!ready}>
Conectar
</Button>
</Stack>
</Card>
</Center>
);
}
+173
View File
@@ -0,0 +1,173 @@
import { useState } from "react";
import {
Avatar,
Badge,
Box,
Divider,
Group,
Menu,
ScrollArea,
Stack,
Text,
TextInput,
UnstyledButton,
} from "@mantine/core";
import {
IconSearch,
IconLogout,
IconDots,
IconLock,
IconHash,
} from "@tabler/icons-react";
import type { Room, User } from "./types";
function initials(s: string) {
return s.replace(/[^a-z0-9]/gi, "").slice(0, 2).toUpperCase() || "?";
}
function timeShort(ts: number) {
const d = new Date(ts);
return `${String(d.getHours()).padStart(2, "0")}:${String(
d.getMinutes(),
).padStart(2, "0")}`;
}
function RoomItem({
room,
active,
onClick,
}: {
room: Room;
active: boolean;
onClick: () => void;
}) {
return (
<UnstyledButton
onClick={onClick}
p="xs"
style={{
borderRadius: "var(--mantine-radius-md)",
backgroundColor: active
? "var(--mantine-color-dark-6)"
: "transparent",
}}
>
<Group gap="sm" wrap="nowrap">
<Avatar radius="md" size={42} color={active ? "brand" : "gray"}>
{initials(room.name)}
</Avatar>
<Box style={{ flex: 1, minWidth: 0 }}>
<Group justify="space-between" gap={4} wrap="nowrap">
<Group gap={4} wrap="nowrap" style={{ minWidth: 0 }}>
{room.encrypted ? (
<IconLock size={13} style={{ flexShrink: 0, opacity: 0.6 }} />
) : (
<IconHash size={13} style={{ flexShrink: 0, opacity: 0.6 }} />
)}
<Text size="sm" fw={600} truncate>
{room.name}
</Text>
</Group>
<Text size="xs" c="dimmed" style={{ flexShrink: 0 }}>
{timeShort(room.lastTs)}
</Text>
</Group>
<Group justify="space-between" gap={4} wrap="nowrap">
<Text size="xs" c="dimmed" truncate>
{room.lastMessage}
</Text>
{room.unread > 0 && (
<Badge size="sm" circle variant="filled" color="brand">
{room.unread}
</Badge>
)}
</Group>
</Box>
</Group>
</UnstyledButton>
);
}
export function Sidebar({
user,
rooms,
activeId,
onSelect,
onLogout,
}: {
user: User;
rooms: Room[];
activeId: string;
onSelect: (id: string) => void;
onLogout: () => void;
}) {
const [q, setQ] = useState("");
const query = q.trim().toLowerCase();
const filtered = query
? rooms.filter(
(r) =>
r.name.toLowerCase().includes(query) ||
r.messages.some((m) => m.body.toLowerCase().includes(query)),
)
: rooms;
return (
<Stack h="100%" gap={0}>
<Group justify="space-between" px="sm" py="xs" wrap="nowrap">
<Group gap="xs" wrap="nowrap" style={{ minWidth: 0 }}>
<Avatar radius="xl" size={34} color="brand">
{initials(user.handle)}
</Avatar>
<Text fw={600} size="sm" truncate>
{user.handle}
</Text>
</Group>
<Menu position="bottom-end" withinPortal>
<Menu.Target>
<UnstyledButton c="dimmed">
<IconDots size={18} />
</UnstyledButton>
</Menu.Target>
<Menu.Dropdown>
<Menu.Item
leftSection={<IconLogout size={15} />}
onClick={onLogout}
>
Desconectar
</Menu.Item>
</Menu.Dropdown>
</Menu>
</Group>
<Box px="sm" pb="sm">
<TextInput
value={q}
onChange={(e) => setQ(e.currentTarget.value)}
placeholder="Buscar rooms, usuarios, mensajes…"
leftSection={<IconSearch size={16} />}
radius="md"
size="sm"
/>
</Box>
<Divider color="dark.4" />
<ScrollArea style={{ flex: 1 }} type="scroll">
<Stack gap={2} p={6}>
{filtered.map((room) => (
<RoomItem
key={room.id}
room={room}
active={room.id === activeId}
onClick={() => onSelect(room.id)}
/>
))}
{filtered.length === 0 && (
<Text c="dimmed" size="sm" ta="center" mt="md">
Sin resultados
</Text>
)}
</Stack>
</ScrollArea>
</Stack>
);
}
-99
View File
@@ -1,99 +0,0 @@
// GatewayClient is the SPA's typed wrapper over the unibus gateway HTTP API.
// Every method is a thin fetch against the gateway, which hosts one real Go bus
// peer per name and performs all NATS + end-to-end crypto on the browser's
// behalf. The base URL is chosen at runtime on the connect screen.
import type { BusEvent, Member, Peer, Room } from "./types";
export class GatewayClient {
constructor(public readonly baseURL: string) {
// Normalize: drop a trailing slash so `${base}/api/...` never doubles up.
this.baseURL = baseURL.replace(/\/+$/, "");
}
private async req<T>(method: string, path: string, body?: unknown): Promise<T> {
const res = await fetch(this.baseURL + path, {
method,
headers: body !== undefined ? { "Content-Type": "application/json" } : undefined,
body: body !== undefined ? JSON.stringify(body) : undefined,
});
const text = await res.text();
if (!res.ok) {
let msg = text;
try {
const j = JSON.parse(text);
if (j && typeof j.error === "string") msg = j.error;
} catch {
// not JSON: keep the raw text
}
throw new Error(msg || `HTTP ${res.status}`);
}
return (text ? JSON.parse(text) : {}) as T;
}
// connect creates (or recovers) the named peer on the gateway and returns its
// public identity. The identity persists across gateway restarts.
connect(name: string): Promise<Peer> {
return this.req<Peer>("POST", "/api/peer", { name });
}
// peers lists every peer currently hosted by the gateway (for the invite picker
// and to label senders by name).
peers(): Promise<Peer[]> {
return this.req<Peer[]>("GET", "/api/peers");
}
// rooms lists the rooms the named peer knows (created or joined).
rooms(peer: string): Promise<Room[]> {
return this.req<Room[]>("GET", `/api/rooms?peer=${encodeURIComponent(peer)}`);
}
// members lists the participants of a room.
members(roomID: string): Promise<Member[]> {
return this.req<Member[]>("GET", `/api/members?room_id=${encodeURIComponent(roomID)}`);
}
// createRoom opens a room on the given subject. encrypt drives both E2E
// encryption and per-message signing; the peer is auto-subscribed.
createRoom(peer: string, subject: string, encrypt: boolean): Promise<Room & { persist: boolean }> {
return this.req("POST", "/api/room", { peer, subject, encrypt, persist: false });
}
// join subscribes the peer to an existing room (must have been invited first
// when the room is encrypted).
join(peer: string, roomID: string): Promise<{ subject: string; encrypt: boolean }> {
return this.req("POST", "/api/join", { peer, room_id: roomID });
}
// invite adds another connected peer (by name) to a room, sealing the room key
// to it. Caller must be the room owner.
invite(peer: string, roomID: string, target: string): Promise<{ status: string }> {
return this.req("POST", "/api/invite", { peer, room_id: roomID, target });
}
// publish sends a text message to a room.
publish(peer: string, roomID: string, text: string): Promise<{ status: string }> {
return this.req("POST", "/api/publish", { peer, room_id: roomID, text });
}
// kick removes a peer (by name) from a room and rotates the key (forward
// secrecy). Caller must be the room owner.
kick(peer: string, roomID: string, target: string): Promise<{ status: string }> {
return this.req("POST", "/api/kick", { peer, room_id: roomID, target });
}
// stream opens the SSE channel for a peer. onEvent fires for each received bus
// message; onError fires if the stream drops. Returns the EventSource so the
// caller can close it.
stream(peer: string, onEvent: (ev: BusEvent) => void, onError?: () => void): EventSource {
const es = new EventSource(`${this.baseURL}/api/stream?peer=${encodeURIComponent(peer)}`);
es.onmessage = (e) => {
try {
onEvent(JSON.parse(e.data) as BusEvent);
} catch {
// ignore malformed frames (keepalive comments never reach onmessage)
}
};
if (onError) es.onerror = onError;
return es;
}
}
-285
View File
@@ -1,285 +0,0 @@
import { useCallback, useEffect, useMemo, useRef, useState } from "react";
import {
AppShell,
Group,
Title,
Badge,
Button,
CopyButton,
Tooltip,
ActionIcon,
ThemeIcon,
Alert,
Transition,
} from "@mantine/core";
import {
IconBolt,
IconLogout,
IconCopy,
IconCheck,
IconAlertTriangle,
} from "@tabler/icons-react";
import { GatewayClient } from "../api";
import type { Member, Message, Peer, Room } from "../types";
import { RoomList } from "./RoomList";
import { MessagePane } from "./MessagePane";
import { MembersPane } from "./MembersPane";
interface Props {
client: GatewayClient;
peer: Peer;
onDisconnect: () => void;
}
// short renders the first 10 chars of an endpoint id, enough to disambiguate.
export function short(endpoint: string): string {
return endpoint.length > 12 ? endpoint.slice(0, 10) + "…" : endpoint;
}
// ChatLayout owns all chat state: the peer's rooms, the active room, the
// per-room message log fed by the SSE stream, the directory of connected peers
// (to label senders and pick invitees), and the active room's member list. Every
// bus action goes through the gateway client.
export function ChatLayout({ client, peer, onDisconnect }: Props) {
const [rooms, setRooms] = useState<Room[]>([]);
const [activeRoom, setActiveRoom] = useState<string | null>(null);
const [messages, setMessages] = useState<Record<string, Message[]>>({});
const [peers, setPeers] = useState<Peer[]>([]);
const [members, setMembers] = useState<Member[]>([]);
const [error, setError] = useState<string | null>(null);
const seq = useRef(0);
const fail = useCallback((e: unknown) => {
setError(e instanceof Error ? e.message : String(e));
}, []);
// ---- data refreshers ----------------------------------------------------
const refreshRooms = useCallback(async () => {
try {
setRooms(await client.rooms(peer.name));
} catch (e) {
fail(e);
}
}, [client, peer.name, fail]);
const refreshPeers = useCallback(async () => {
try {
setPeers(await client.peers());
} catch (e) {
fail(e);
}
}, [client, fail]);
const refreshMembers = useCallback(
async (roomID: string) => {
try {
setMembers(await client.members(roomID));
} catch (e) {
fail(e);
}
},
[client, fail],
);
// ---- live stream (SSE) --------------------------------------------------
useEffect(() => {
const es = client.stream(
peer.name,
(ev) => {
seq.current += 1;
const msg: Message = { ...ev, id: `${ev.ts}-${seq.current}` };
setMessages((prev) => {
const list = prev[ev.room_id] ?? [];
return { ...prev, [ev.room_id]: [...list, msg] };
});
},
() => setError("Se perdió la conexión con el gateway (stream SSE)"),
);
return () => es.close();
}, [client, peer.name]);
// Initial load.
useEffect(() => {
refreshRooms();
refreshPeers();
}, [refreshRooms, refreshPeers]);
// Refresh members whenever the active room changes.
useEffect(() => {
if (activeRoom) refreshMembers(activeRoom);
else setMembers([]);
}, [activeRoom, refreshMembers]);
// ---- actions ------------------------------------------------------------
const onCreateRoom = useCallback(
async (subject: string, encrypt: boolean) => {
try {
const r = await client.createRoom(peer.name, subject, encrypt);
await refreshRooms();
setActiveRoom(r.room_id);
} catch (e) {
fail(e);
}
},
[client, peer.name, refreshRooms, fail],
);
const onJoinRoom = useCallback(
async (roomID: string) => {
try {
await client.join(peer.name, roomID);
await refreshRooms();
setActiveRoom(roomID);
} catch (e) {
fail(e);
}
},
[client, peer.name, refreshRooms, fail],
);
const onInvite = useCallback(
async (target: string) => {
if (!activeRoom) return;
try {
await client.invite(peer.name, activeRoom, target);
await refreshMembers(activeRoom);
} catch (e) {
fail(e);
}
},
[client, peer.name, activeRoom, refreshMembers, fail],
);
const onKick = useCallback(
async (target: string) => {
if (!activeRoom) return;
try {
await client.kick(peer.name, activeRoom, target);
await refreshMembers(activeRoom);
} catch (e) {
fail(e);
}
},
[client, peer.name, activeRoom, refreshMembers, fail],
);
const onPublish = useCallback(
async (text: string) => {
if (!activeRoom) return;
try {
await client.publish(peer.name, activeRoom, text);
} catch (e) {
fail(e);
}
},
[client, peer.name, activeRoom, fail],
);
// endpoint -> display name, using the peer directory; falls back to a short id.
const nameFor = useMemo(() => {
const byEndpoint = new Map(peers.map((p) => [p.endpoint_id, p.name]));
return (endpoint: string) =>
endpoint === peer.endpoint_id ? peer.name : byEndpoint.get(endpoint) ?? short(endpoint);
}, [peers, peer]);
const activeRoomObj = rooms.find((r) => r.room_id === activeRoom) ?? null;
const iAmOwner = members.some((m) => m.endpoint === peer.endpoint_id && m.role === "owner");
return (
<AppShell
header={{ height: 60 }}
navbar={{ width: 300, breakpoint: "sm" }}
aside={{ width: 300, breakpoint: "md", collapsed: { desktop: !activeRoom, mobile: true } }}
padding={0}
>
<AppShell.Header>
<Group h="100%" px="md" justify="space-between" wrap="nowrap">
<Group gap="xs" wrap="nowrap">
<ThemeIcon variant="light" color="violet" radius="md">
<IconBolt size={18} />
</ThemeIcon>
<Title order={4}>unibus</Title>
</Group>
<Group gap="xs" wrap="nowrap">
<Badge variant="light" color="violet" size="lg">
{peer.name}
</Badge>
<CopyButton value={peer.endpoint_id}>
{({ copied, copy }) => (
<Tooltip label={copied ? "¡copiado!" : peer.endpoint_id} withArrow>
<ActionIcon variant="subtle" color="gray" onClick={copy}>
{copied ? <IconCheck size={16} /> : <IconCopy size={16} />}
</ActionIcon>
</Tooltip>
)}
</CopyButton>
<Button
variant="subtle"
color="gray"
leftSection={<IconLogout size={16} />}
onClick={onDisconnect}
>
Salir
</Button>
</Group>
</Group>
</AppShell.Header>
<AppShell.Navbar>
<RoomList
rooms={rooms}
activeRoom={activeRoom}
onSelect={setActiveRoom}
onCreateRoom={onCreateRoom}
onJoinRoom={onJoinRoom}
/>
</AppShell.Navbar>
<AppShell.Main h="100vh">
{error && (
<Transition mounted={!!error} transition="slide-down">
{(styles) => (
<Alert
style={{ ...styles, position: "absolute", top: 70, left: "50%", transform: "translateX(-50%)", zIndex: 200, minWidth: 360 }}
color="red"
variant="filled"
icon={<IconAlertTriangle size={18} />}
withCloseButton
onClose={() => setError(null)}
title="Error"
>
{error}
</Alert>
)}
</Transition>
)}
<MessagePane
room={activeRoomObj}
messages={activeRoom ? messages[activeRoom] ?? [] : []}
myEndpoint={peer.endpoint_id}
nameFor={nameFor}
onPublish={onPublish}
/>
</AppShell.Main>
<AppShell.Aside>
{activeRoomObj && (
<MembersPane
room={activeRoomObj}
members={members}
peers={peers}
myEndpoint={peer.endpoint_id}
iAmOwner={iAmOwner}
nameFor={nameFor}
onInvite={onInvite}
onKick={onKick}
onRefresh={() => activeRoom && refreshMembers(activeRoom)}
/>
)}
</AppShell.Aside>
</AppShell>
);
}
-116
View File
@@ -1,116 +0,0 @@
import { useState } from "react";
import {
Button,
Card,
Center,
Group,
Stack,
Text,
TextInput,
Title,
Alert,
ThemeIcon,
} from "@mantine/core";
import { IconBolt, IconPlugConnected, IconAlertTriangle } from "@tabler/icons-react";
import { GatewayClient } from "../api";
import type { Peer } from "../types";
const LS_GATEWAY = "unibus.gateway";
const LS_PEER = "unibus.peer";
interface Props {
onConnect: (client: GatewayClient, peer: Peer) => void;
}
// ConnectScreen asks for the gateway URL and the identity (peer name) to connect
// as. Both persist in localStorage so a reload reconnects with one click. The
// gateway hosts the real Go bus peer; the browser only drives it.
export function ConnectScreen({ onConnect }: Props) {
const [gateway, setGateway] = useState(
() => localStorage.getItem(LS_GATEWAY) ?? "http://localhost:7700",
);
const [name, setName] = useState(() => localStorage.getItem(LS_PEER) ?? "");
const [busy, setBusy] = useState(false);
const [error, setError] = useState<string | null>(null);
const connect = async () => {
const trimmed = name.trim();
if (!trimmed) {
setError("Elige un nombre de identidad");
return;
}
setBusy(true);
setError(null);
try {
const client = new GatewayClient(gateway.trim());
const peer = await client.connect(trimmed);
localStorage.setItem(LS_GATEWAY, client.baseURL);
localStorage.setItem(LS_PEER, trimmed);
onConnect(client, peer);
} catch (e) {
setError(e instanceof Error ? e.message : String(e));
} finally {
setBusy(false);
}
};
return (
<Center h="100vh" p="md">
<Card withBorder shadow="md" radius="lg" p="xl" w={420} maw="100%">
<Stack gap="lg">
<Group gap="sm">
<ThemeIcon size="xl" radius="md" variant="light" color="violet">
<IconBolt size={26} />
</ThemeIcon>
<div>
<Title order={3}>unibus</Title>
<Text size="sm" c="dimmed">
chat cifrado extremo a extremo sobre NATS
</Text>
</div>
</Group>
<TextInput
label="Gateway"
description="URL del gateway web de unibus"
placeholder="http://localhost:7700"
value={gateway}
onChange={(e) => setGateway(e.currentTarget.value)}
disabled={busy}
/>
<TextInput
label="Identidad"
description="Tu nombre de peer en el bus (persistente)"
placeholder="ana"
value={name}
onChange={(e) => setName(e.currentTarget.value)}
onKeyDown={(e) => e.key === "Enter" && connect()}
disabled={busy}
data-autofocus
/>
{error && (
<Alert
color="red"
variant="light"
icon={<IconAlertTriangle size={18} />}
title="No se pudo conectar"
>
{error}
</Alert>
)}
<Button
leftSection={<IconPlugConnected size={18} />}
onClick={connect}
loading={busy}
fullWidth
size="md"
>
Conectar
</Button>
</Stack>
</Card>
</Center>
);
}
-153
View File
@@ -1,153 +0,0 @@
import { useState } from "react";
import {
Stack,
Group,
Text,
Badge,
Select,
Button,
ActionIcon,
Divider,
Box,
Avatar,
Tooltip,
ScrollArea,
} from "@mantine/core";
import { IconUserPlus, IconUserMinus, IconRefresh, IconUsers } from "@tabler/icons-react";
import type { Member, Peer, Room } from "../types";
interface Props {
room: Room;
members: Member[];
peers: Peer[];
myEndpoint: string;
iAmOwner: boolean;
nameFor: (endpoint: string) => string;
onInvite: (target: string) => void;
onKick: (target: string) => void;
onRefresh: () => void;
}
// MembersPane is the right column: who is in the active room, plus invite (pick a
// connected peer) and kick (owner only). Invite/kick address peers by name; the
// gateway resolves the name to its bus endpoint.
export function MembersPane({
room,
members,
peers,
myEndpoint,
iAmOwner,
nameFor,
onInvite,
onKick,
onRefresh,
}: Props) {
const [target, setTarget] = useState<string | null>(null);
const memberEndpoints = new Set(members.map((m) => m.endpoint));
// Candidates to invite: connected peers not already in the room.
const candidates = peers
.filter((p) => !memberEndpoints.has(p.endpoint_id))
.map((p) => ({ value: p.name, label: p.name }));
const invite = () => {
if (target) {
onInvite(target);
setTarget(null);
}
};
return (
<Stack gap={0} h="100%">
<Group justify="space-between" px="md" py="sm" wrap="nowrap" style={{ borderBottom: "1px solid var(--mantine-color-default-border)" }}>
<Group gap="xs">
<IconUsers size={18} />
<Text fw={600}>Miembros</Text>
<Badge size="sm" variant="light">
{members.length}
</Badge>
</Group>
<Tooltip label="Recargar" withArrow>
<ActionIcon variant="subtle" color="gray" onClick={onRefresh}>
<IconRefresh size={16} />
</ActionIcon>
</Tooltip>
</Group>
<Box p="md">
<Text size="xs" fw={700} c="dimmed" tt="uppercase" mb="xs">
Invitar {room.encrypt && "(reparte la clave)"}
</Text>
<Group gap="xs" wrap="nowrap" align="flex-end">
<Select
style={{ flex: 1 }}
size="xs"
placeholder="peer conectado"
data={candidates}
value={target}
onChange={setTarget}
searchable
nothingFoundMessage="sin peers libres"
comboboxProps={{ withinPortal: true }}
/>
<Button
size="xs"
leftSection={<IconUserPlus size={14} />}
onClick={invite}
disabled={!target}
>
Invitar
</Button>
</Group>
</Box>
<Divider />
<ScrollArea style={{ flex: 1 }}>
<Stack gap={4} p="md">
{members.map((m) => {
const isMe = m.endpoint === myEndpoint;
const name = nameFor(m.endpoint);
const canKick = iAmOwner && !isMe && m.role !== "owner";
return (
<Group key={m.endpoint} justify="space-between" wrap="nowrap" gap="xs">
<Group gap="xs" wrap="nowrap" style={{ minWidth: 0 }}>
<Avatar size="sm" radius="xl" color="violet">
{name.slice(0, 2).toUpperCase()}
</Avatar>
<Box style={{ minWidth: 0 }}>
<Text size="sm" fw={isMe ? 700 : 500} truncate>
{name} {isMe && "(tú)"}
</Text>
<Text size="9px" c="dimmed" truncate>
{m.endpoint}
</Text>
</Box>
</Group>
<Group gap={4} wrap="nowrap">
{m.role === "owner" && (
<Badge size="xs" color="yellow" variant="light">
owner
</Badge>
)}
{canKick && (
<Tooltip label="Expulsar (rota la clave)" withArrow>
<ActionIcon
variant="subtle"
color="red"
size="sm"
onClick={() => onKick(name)}
>
<IconUserMinus size={15} />
</ActionIcon>
</Tooltip>
)}
</Group>
</Group>
);
})}
</Stack>
</ScrollArea>
</Stack>
);
}
-153
View File
@@ -1,153 +0,0 @@
import { useEffect, useRef, useState } from "react";
import {
Stack,
Group,
Text,
Badge,
Paper,
ScrollArea,
TextInput,
ActionIcon,
Center,
ThemeIcon,
Box,
CopyButton,
Tooltip,
} from "@mantine/core";
import {
IconLock,
IconHash,
IconSend,
IconMessages,
IconCopy,
IconCheck,
} from "@tabler/icons-react";
import type { Message, Room } from "../types";
interface Props {
room: Room | null;
messages: Message[];
myEndpoint: string;
nameFor: (endpoint: string) => string;
onPublish: (text: string) => void;
}
// formatTime renders a message timestamp as HH:mm:ss in 24h European style.
function formatTime(ts: number): string {
return new Date(ts).toLocaleTimeString("es-ES", {
hour: "2-digit",
minute: "2-digit",
second: "2-digit",
});
}
// MessagePane is the center column: the active room's live message log plus the
// composer. Own messages align right; others align left and show the sender.
export function MessagePane({ room, messages, myEndpoint, nameFor, onPublish }: Props) {
const [text, setText] = useState("");
const viewport = useRef<HTMLDivElement>(null);
// Auto-scroll to the newest message.
useEffect(() => {
viewport.current?.scrollTo({ top: viewport.current.scrollHeight, behavior: "smooth" });
}, [messages.length]);
if (!room) {
return (
<Center h="100%">
<Stack align="center" gap="xs">
<ThemeIcon size={64} radius="xl" variant="light" color="gray">
<IconMessages size={34} />
</ThemeIcon>
<Text c="dimmed">Elige o crea una room para empezar a chatear</Text>
</Stack>
</Center>
);
}
const send = () => {
const t = text.trim();
if (t) {
onPublish(t);
setText("");
}
};
return (
<Stack gap={0} h="100%">
<Group justify="space-between" px="md" py="sm" wrap="nowrap" style={{ borderBottom: "1px solid var(--mantine-color-default-border)" }}>
<Group gap="xs" wrap="nowrap">
{room.encrypt ? <IconLock size={18} /> : <IconHash size={18} />}
<Text fw={600}>{room.subject}</Text>
{room.encrypt && (
<Badge size="sm" color="teal" variant="light">
cifrada E2E
</Badge>
)}
</Group>
<CopyButton value={room.room_id}>
{({ copied, copy }) => (
<Tooltip label={copied ? "¡copiado!" : "copiar room id"} withArrow>
<ActionIcon variant="subtle" color="gray" onClick={copy}>
{copied ? <IconCheck size={16} /> : <IconCopy size={16} />}
</ActionIcon>
</Tooltip>
)}
</CopyButton>
</Group>
<ScrollArea style={{ flex: 1 }} viewportRef={viewport} p="md">
<Stack gap="sm">
{messages.length === 0 && (
<Text c="dimmed" ta="center" py="xl" size="sm">
No hay mensajes todavía.
</Text>
)}
{messages.map((m) => {
const mine = m.sender === myEndpoint;
return (
<Box
key={m.id}
style={{ display: "flex", justifyContent: mine ? "flex-end" : "flex-start" }}
>
<Paper
withBorder
shadow="xs"
radius="md"
p="xs"
bg={mine ? "violet.9" : undefined}
maw="75%"
>
{!mine && (
<Text size="xs" fw={700} c="violet.4">
{nameFor(m.sender)}
</Text>
)}
<Text size="sm" style={{ wordBreak: "break-word", whiteSpace: "pre-wrap" }}>
{m.text}
</Text>
<Text size="9px" c="dimmed" ta="right" mt={2}>
{formatTime(m.ts)}
</Text>
</Paper>
</Box>
);
})}
</Stack>
</ScrollArea>
<Group p="md" gap="xs" wrap="nowrap" style={{ borderTop: "1px solid var(--mantine-color-default-border)" }}>
<TextInput
style={{ flex: 1 }}
placeholder={`Mensaje a ${room.subject}`}
value={text}
onChange={(e) => setText(e.currentTarget.value)}
onKeyDown={(e) => e.key === "Enter" && send()}
/>
<ActionIcon size="lg" onClick={send} disabled={!text.trim()}>
<IconSend size={18} />
</ActionIcon>
</Group>
</Stack>
);
}

Some files were not shown because too many files have changed in this diff Show More