Commit Graph

150 Commits

Author SHA1 Message Date
egutierrez 669bad52af feat(membershipd): one-command bot provisioning (bot add)
Add `membershipd bot add --handle <name> --out <path> [--role] [--store]` to
provision a bus identity for an automated process in a single step: mint a fresh
Ed25519+X25519 identity (cs.GenerateIdentity, the same derivation worker/chat
use), register its signing key in the allowlist, and write the credentials to a
0600 file. The file is the canonical identity format read by client.LoadIdentity,
so a worker/clientcheck binary pointed at --out connects as the new user with no
extra conversion. Shares the sqlite/kv store plumbing with `user add`.

New exported pkg/client.WriteNewIdentity writes an identity in that format but
refuses to overwrite an existing file (never silently clobber private keys).

provisionBot ordering guarantees no half-provisioned bot: refuse an existing
--out before touching the store, register (an already-registered key is a clear
error, not a panic), then write credentials. Tests cover the golden path
(register + 0600 file + LoadIdentity round-trip), default role, the
already-registered error path (no file written), and the out-exists error path
(no orphan user).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 15:30:17 +02:00
egutierrez 2ba40701b2 feat(membership): add GET /api/directory for endpoint->handle resolution
Authenticated bus users (member or admin) can now map a sender's endpoint id
back to a readable handle. The endpoint is derived server-side from each user's
sign_pub with frame.EndpointID (base64url(sha256(signPub)), unpadded), matching
the bus's own construction byte-for-byte. Only active users are listed; under
enforce the existing auth middleware rejects an unauthenticated caller with 401.

Tests cover the golden path (two users -> 200 with handles + endpoints), the
auth contract (unsigned -> 401), revoked-user exclusion, and endpoint parity
against the cross-language vector from cmd/busvectors.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 15:27:38 +02:00
egutierrez 363aa97def Merge branch 'quick/proxy-ready' 2026-06-14 13:49:24 +02:00
egutierrez e3f40913bc chore(deploy): version the same-origin Caddy config for uniweb
Capture the reverse-proxy vhost that fronts the browser-native uniweb
client on magnus (chat-<hash>.organic-machine.com): the SPA at /, the
signed control plane under /api (prefix stripped so request signatures
verify), and the NATS-over-WebSocket data plane under /nats. One origin
means no CORS and keeps the cluster node IPs hidden behind the proxy.

Self-contained fragment (includes the shared security_headers snippet) so
it validates with `caddy validate` on its own; the other vhosts on magnus
carry basic-auth secrets and are intentionally left out of git. Documents
the matching membershipd flags this config requires (--cors-origins with
the same-origin host, --trusted-proxies naming the Caddy node).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 12:43:15 +02:00
egutierrez 0b96c114b6 feat(membership): trust reverse-proxy forwarded client IP for rate limit
The per-IP rate limiter keys on the transport RemoteAddr. Behind the
same-origin Caddy proxy that fronts the control plane, every request
arrives with the proxy's single IP, which collapses the limiter into one
bucket shared by the whole world — a flood from one client throttles all
of them.

Add an opt-in `--trusted-proxies` flag (comma-separated IPs/CIDRs). When
the immediate peer is one of the named proxies, clientIP now believes its
X-Forwarded-For (read right-to-left, skipping trusted hops) or X-Real-IP
and keys on the real client. A direct, non-trusted peer's forwarding
headers are ignored entirely, so this opens no quota-fanning hole: an
attacker connecting straight to the public :8470 cannot spoof a key. The
zero value (no flag) preserves the prior RemoteAddr-only behavior exactly.

Covered by ratelimit_proxy_test.go: trusted vs untrusted peers, XFF
right-to-left precedence, client-prepended forgery, X-Real-IP fallback,
and rejection of malformed proxy entries.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 12:29:57 +02:00
egutierrez 294905984c fix(membership): allow X-Unibus-* auth headers in CORS preflight
A browser signs every control-plane request with X-Unibus-Pub/Ts/Nonce/Sig
(busauth.signedHeaders). The CORS Allow-Headers only listed Content-Type and
Authorization, so the browser's preflight rejected the real request and the SPA
failed with 'Failed to fetch' on the first authenticated call (listRooms). Add the
four X-Unibus-* headers to Access-Control-Allow-Headers.

This was invisible to the Node smoke (fetch in Node does no CORS preflight); only a
real browser surfaced it. Verified live: enmanuel logs into uniweb against the
cluster and lists rooms. Regression test asserts the header is present.
2026-06-14 12:12:20 +02:00
egutierrez feb917fc6a feat(cluster): deploy browser WebSocket + CORS to the 3-node cluster
Roll the --ws-port + --cors-origins flags (issue uniweb/0001) out to the unibus
cluster so the browser-native uniweb client can reach the data plane (nats.ws)
and the control plane (CORS) on every node. The WS reuses the data-plane TLS
(wss://) and the same origin allowlist.

Per-node WS port override (WS_PORT_<NAME>): magnus runs unibus_admin on
127.0.0.1:8480, so the bus WS binds 8485 there to avoid a crash-loop; homer and
datardos keep 8480. deploy-cluster.sh also gains DEPLOY_ONLY=<name> for rolling
one node at a time. Rolled out and verified 2026-06-13: all three nodes healthy,
WS reachable, CORS 204, cluster quorum (R3) intact throughout.
2026-06-13 23:23:52 +02:00
egutierrez c0216de766 feat(membershipd): --ws-port wires the embedded NATS WebSocket listener
Phase 0 left the WebsocketConfig field unwired; add --ws-port so membershipd can
actually expose the browser data-plane transport. It reuses the data-plane TLS
(wss:// when TLS is on, ws:// for a loopback dev stack) and the same
--cors-origins allowlist that gates the control plane, so one flag pair opens
both planes to the browser-native uniweb client (issue uniweb/0001).
2026-06-13 23:05:33 +02:00
egutierrez 0088fb946b feat(busvectors): add nkey + signed control-request vectors
Extend the cross-language vectors with the NATS user nkey derived from the
Ed25519 public key, and a signed control-plane request (CanonicalRequest +
Ed25519 signature). These let the TypeScript busauth port verify it authenticates
on both planes exactly like the Go client (issue uniweb/0001, Phase 1).
2026-06-13 22:49:20 +02:00
egutierrez e058b324f4 Merge branch 'quick/0001-ws-cors-prep' 2026-06-13 22:21:51 +02:00
egutierrez a5086ecd18 chore: bump unibus to 0.14.0 (browser-native client prep, Phase 0) 2026-06-13 22:21:51 +02:00
egutierrez 8a51c5cc1f feat(busvectors): deterministic cross-language test vectors
Add cmd/busvectors, a generator that emits stable JSON test vectors for the bus
protocol and its E2E crypto (endpoint id, Ed25519 sign, ChaCha20-Poly1305 AEAD
with a fixed nonce, sealed-box of a room key, and canonical Frame wire bytes +
SigningBytes). It uses the same registry crypto (functions/cybersecurity) the
bus uses, so the vectors are the contract the TypeScript port must match
byte-for-byte (issue uniweb/0001, Phase 0).

Regenerate with: go run ./cmd/busvectors > ../uniweb/web/src/bus/testdata/vectors.json
2026-06-13 22:21:32 +02:00
egutierrez ec8d34aaa1 feat(membership): opt-in CORS allowlist for the browser-native client
Add Server.AllowedOrigins and an applyCORS step at the top of ServeHTTP so a
browser SPA (uniweb) can call the control plane cross-origin: an allow-listed
Origin gets the Access-Control-Allow-* headers, and a preflight (OPTIONS) is
answered 204 before the rate limiter and auth ever run. A disallowed or missing
origin gets no headers (preflight 403), so the browser blocks the request.

Wire it through membershipd's --cors-origins flag (comma list, reusing
splitRoutes as a generic parser). Empty allowlist = CORS off, no headers
emitted, behavior identical to before: native Go/Kotlin clients send no Origin
and are unaffected. Opt-in per deployment (issue uniweb/0001, Phase 0).

Tests: preflight allow/deny, header on the real response, CORS-off default, and
no-Origin native client unaffected.
2026-06-13 22:17:44 +02:00
egutierrez 36f4ba0eaf feat(embeddednats): optional WebSocket listener for browser clients
Add WebsocketConfig to ServerConfig so the embedded nats-server can expose an
additional WebSocket port (nats.ws) alongside the TCP data plane. This lets a
browser SPA speak the NATS protocol directly, the way native TCP peers (Go,
Kotlin/android) already do — the first enabler for uniweb becoming a
browser-native client with no Go gateway (issue uniweb/0001, Phase 0).

The client authenticator applies to WebSocket connections too, so this adds a
transport, not a trust bypass. Plain ws:// is used only without TLS (loopback
dev); a certificate yields wss://. An empty AllowedOrigins enforces same-origin.
Nil WebsocketConfig keeps the server TCP-only, so existing single-node and
cluster deployments are unchanged.

Tests: WebSocket listener opens and completes the upgrade handshake (101); no
listener opens when WebsocketConfig is nil.
2026-06-13 22:11:39 +02:00
egutierrez 9661a5ce1f refactor: split web frontend + gateway out to uniweb app (bump 0.13.0)
The SPA (web/) and the web gateway (cmd/webgw) move to a dedicated app
projects/message_bus/apps/uniweb (its own Gitea sub-repo). unibus is now
strictly the bus plane: membership/keys, the client library and demo peers.
uniweb consumes unibus as a Go module via replace => ../unibus.

No capability lost; same SPA and gateway, in their own service folder.
go build/vet/test green after extraction.
2026-06-13 21:21:08 +02:00
egutierrez fadee1a7d0 Merge branch 'integrate/web-join-0.12.0' 2026-06-13 21:17:37 +02:00
egutierrez 9567ff4db9 chore: bump unibus to 0.12.0 (web wallet + per-user gateway) 2026-06-13 21:17:37 +02:00
egutierrez 9e01060931 Merge branch 'master' into integrate/web-join-0.12.0 2026-06-13 21:16:25 +02:00
egutierrez 4994ea1483 feat(web): wallet join/recover/login (BIP39 seed identity)
Add the device-local wallet onboarding to the SPA. The user's identity
is derived deterministically from a 12-word BIP39 mnemonic and lives on
the device; the browser never signs, never talks NATS, and never sends
the seed to the server.

Wallet layer (web/src/wallet/):
- derive.ts: deterministic identity from a mnemonic. seed = BIP39 seed,
  then HKDF-SHA256 domain-separated into an Ed25519 signing key
  (info "unibus-sign-v1") and an X25519 key-exchange key (info
  "unibus-kex-v1"). The same mnemonic always yields the same sign_pub,
  which is what makes recovery possible without admin intervention. The
  four halves match cs.Identity on the Go side exactly.
- bip39.ts: thin wrappers over @scure/bip39 (generate, validate,
  normalize) so the checksum logic stays in the audited library.
- crypto.ts: at-rest encryption of the private key with WebCrypto only
  (PBKDF2-SHA256 210k iters -> AES-256-GCM). The password never leaves
  the device and only protects the local key copy.
- store.ts: IndexedDB persistence of the encrypted identity (private key
  encrypted; public halves + handle in the clear for display).
- account.ts: saveAndOpen / unlockAndOpen / localIdentity compose the
  primitives with the gateway session API.

Screens:
- Welcome: choose invite link or recover-with-seed on an empty device.
- Join: generate seed, show it once behind an acknowledge gate, confirm
  3 random words, set a local password, register the PUBLIC key with the
  bus via the invite token, then open the session.
- Recover: paste the 12 words, validate, show the reconstructed sign_pub,
  set a new local password, open the session. No register (the identity
  is already in the allowlist).
- WalletLogin: unlock the device's stored identity with the password.
- AuthShell: shared card/header for all pre-chat screens.
- App.tsx: route between join / welcome / login / recover / chat based on
  the invite link, a live gateway session, and any stored identity.

api.ts/types.ts: add register() and session() against the gateway
contract; vite dev server on :5183.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 21:21:50 +02:00
egutierrez 7d93d550d1 feat(webgw): per-user wallet sessions + invite register
Add the gateway backend for the wallet onboarding flow so each browser
session carries its OWN bus identity instead of sharing the single
operator client.

- POST /api/session (session.go): the browser hands its full wallet
  keypair (unlocked from the local encrypted key, over TLS) and the
  gateway spins up a dedicated bus client that acts AS that user. The
  private key lives only in process memory for the life of the session
  and is dropped on logout/shutdown. identityFromHex enforces the exact
  key sizes (sign_pub 32, sign_priv 64, kex_pub 32, kex_priv 32) that
  match cs.Identity on the Go side.
- POST /api/register (register.go): unauthenticated onboarding gated by
  a one-shot invite token. Validates the two PUBLIC key halves, then
  either consumes a configured --mock-tokens invite (local testing) or
  proxies to the bus POST /register (--register-url, bus >= 0.12.0). The
  handle/role come from the invite, never from the client.
- server.go: sessions move from a token->time map to a sessionStore of
  per-user *session records; auth() now resolves the session and passes
  its gateway to each handler. The legacy operator passphrase login
  (POST /api/login) is kept, bound to the shared operator gateway.
- main.go: build a busTemplate config that wallet sessions clone with
  their own Identity; wire --register-url / --mock-tokens.
- webgw_test.go: identity-size validation, hex-key validation, mock
  token parsing, and single-use register (201 then 409) using a fixed
  browser-derived wallet vector.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 21:21:33 +02:00
egutierrez f31580deec Merge quick/nats-monitor-flag: UNIBUS_NATS_MONITOR loopback monitoring decoupled from debug log (bump 0.11.0) 2026-06-07 21:18:59 +02:00
agent 5ea8fa1c20 feat(web): wire the SPA to the live bus via the gateway (drop mock)
Replace the mock data source with a real data layer that talks to the webgw
gateway over REST + SSE. The UI components keep their look and props; only
where the data comes from changed.

- src/api.ts: the single repository layer. fetch wrappers (same-origin cookie)
  for login/logout/me and rooms list/create/join/send, plus streamRoom() which
  opens an EventSource and yields each decrypted message. Wire->UI mappers
  (roomFromWire, messageFromWire).
- src/types.ts: add the gateway wire shapes (MeInfo, RoomWire, MsgWire) next to
  the existing UI types.
- App.tsx: probe /api/me on mount to resume an existing session; otherwise show
  Login. Logout calls the gateway.
- Login.tsx: the password field now unlocks the gateway session (operator
  passphrase); shows a basic error and a loading state. Wallet-per-browser is
  phase 2.
- ChatShell.tsx: load rooms from /api/rooms with loading / empty / error states;
  same Flex layout.
- ChatPanel.tsx: stream messages over SSE for the active room (dedup by id),
  composer sends through the gateway; no optimistic insert (the peer's own echo
  returns over SSE with the real frame id).
- vite.config.ts: dev proxy /api (REST + SSE) -> the gateway on :8481.

mock.ts is left untouched (no longer imported) to avoid churn with the parallel
styling work on master.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 21:14:19 +02:00
agent fb8a03cf0c feat(webgw): web gateway peer (REST + SSE) for the chat SPA
Add cmd/webgw: a single Go binary that holds the operator's bus identity,
connects to the bus as a real authenticated peer (pkg/client), and exposes a
small REST + SSE API the browser consumes. The browser never signs, never
speaks NATS, and never sees a private key.

Endpoints (all under /api, gated by a session cookie except login):
  POST /api/login            unlock a session with the operator passphrase
  POST /api/logout
  GET  /api/me               operator identity the gateway acts as
  GET  /api/rooms            ListMyRooms
  POST /api/rooms            CreateRoom (default policy: encrypted+persisted+signed)
  POST /api/rooms/{id}/join  Join (fetch room key)
  POST /api/rooms/{id}/send  Publish (sealed + signed by the peer)
  GET  /api/rooms/{id}/stream  SSE of decrypted frames (history then live)

Design notes:
- One fan-out hub per room: a single bus subscription is multiplexed to N SSE
  clients, avoiding the per-(room,endpoint) durable-consumer contention that
  multiple Subscribe calls would cause.
- Posture seam mirrors unibus_admin/clientcheck: empty --ca = plaintext dev,
  non-empty = TLS+nkey on both planes; RefreshSession after a membership change
  only under the secured (ACL) posture.
- Identity loaded from `pass` or a 0600 file, held only in memory.
- Session auth: passphrase compared in constant time; opaque HttpOnly cookie
  so EventSource (which cannot set headers) can authenticate the stream.

TRUST MODEL: room content stays end-to-end encrypted on the bus. The gateway
reads plaintext only because it acts AS the operator's client — a legitimate
member of each room holding the room key. The per-browser wallet (WebCrypto)
that moves decryption into the browser is phase 2.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 21:14:08 +02:00
Egutierrez 1c9325104c feat(embeddednats): UNIBUS_NATS_MONITOR flag decoupled from debug log
Add a dedicated UNIBUS_NATS_MONITOR=1 toggle that opens the embedded
nats-server monitoring HTTP endpoint (127.0.0.1:8222, loopback only) so a
local metrics scraper can read /varz, /connz and /jsz for server-level
metrics (msgs/s, connections, KV bucket msgs, RAFT leader per stream,
restarts).

Previously the monitoring endpoint was only reachable via UNIBUS_NATS_DEBUG=1,
which is coupled to the verbose nats-server debug log: enabling the endpoint
also wrote routes/RAFT/room subjects to journald in clear, which regresses the
hardened posture (issue 0007). The two concerns are now decoupled.

The toggle computation is extracted to a pure function
natsLogOpts(debugEnv, monitorEnv) (noLog, debug, trace, monitor): MONITOR=1
opens the endpoint while keeping the log quiet (NoLog true / Debug false). The
inverse coupling is preserved for backward compatibility (DEBUG still implies
MONITOR). The 127.0.0.1 bind stays hardcoded — the monitoring endpoint has no
auth and must never be reachable from the network.

Deploy wiring versioned: additive systemd drop-in
membershipd-cluster.service.d/nats-monitor.conf (Environment=UNIBUS_NATS_MONITOR=1)
plus a "NATS server metrics" section in the cluster README with the rolling
activation runbook (magnus -> homer -> datardos) gated on R3 reconvergence
(followers 2/2) between nodes.

Tests: pure decoupling table (monitor on => log NOT debug; debug => monitor;
default closed) + a real embedded server with MONITOR=1 asserting /varz answers
200 on loopback:8222, and a server without the flag with the endpoint closed.
100% additive: behavior is identical without the flag. Bump app.md 0.10.0 ->
0.11.0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 20:57:46 +02:00
egutierrez b4f3118e85 Merge quick/users-http-admin: HTTP admin-only users API + client methods (report 0014) 2026-06-07 20:46:44 +02:00
egutierrez e9053169da Merge quick/0011-deploy-gaps: live user-add --store kv + clientcheck E2E + runbook fixes (report 0012) 2026-06-07 20:46:44 +02:00
Egutierrez b983e43090 docs(0007): spec encryption-at-rest del control plane (JetStream/SQLite en disco) 2026-06-07 20:34:35 +02:00
egutierrez b379730225 docs(app): document users HTTP admin model, bump 0.10.0
Add a gotcha describing the unified-storage model (the server writes
users to the same store/KV as rooms), the admin-only HTTP surface, and
the CLI-seeds-admin-#0 bootstrap. Bump the version 0.9.0 -> 0.10.0 and
add the capability growth log entry for the new HTTP admin users API.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 20:32:05 +02:00
egutierrez 450ca01baf feat(membership,client): HTTP admin-only users API
Close the last control-plane asymmetry: rooms had a signed HTTP surface
but users were only manageable via the local CLI or direct store access.
Add admin-only HTTP endpoints, symmetric with rooms, executed against the
same privileged store the server already serves (SQLite single-node, the
replicated JetStream KV in cluster) — no new KV connection, no internal
identity, so the admin panel can manage the allowlist by signing as an
admin instead of needing --db / direct KV access.

Endpoints (all behind requireAdmin, on top of the existing
signature+nonce+TLS+enforce middleware):
  - GET  /users                    list the full allowlist (incl. revoked)
  - POST /users                    add {sign_pub, handle, role}
  - POST /users/{signpub}/revoke   revoke (status flip, no hard delete)

requireAdmin is default-deny with no dev relaxation: it allows a request
only when the authenticated signer is confirmed by the store as an active
admin; any other case (no signer, non-admin, revoked, store error) is 403,
fail-closed. The request context now also carries the signer's sign_pub
hex, because the endpoint id is a one-way hash of the key and cannot be
reversed to look the signer up in the allowlist.

Validation/idempotency mirror the CLL: sign_pub must be 64-hex, role must
be admin|member (empty defaults to member), re-adding an existing key is a
409 that leaves the row untouched. The hex check is unified into
membership.ValidateSignPubHex, reused by the CLI and the handlers.

pkg/client gains ListUsers/AddUser/RevokeUser (flat UserInfo type) signed
via doJSON, so the panel plugs in directly.

Tests: non-admin -> 403 on all three endpoints; admin add->list->revoke
roundtrip; validation (400 hex, 400 role, 409 re-add, row untouched); plus
a client test against an embedded membershipd under enforce.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 20:31:57 +02:00
egutierrez e1a7402ff1 chore: bump unibus to 0.9.0 (live user-add + clientcheck)
New capability membershipd user add --store kv against a live cluster plus
cmd/clientcheck end-to-end verification (issue 0011 gaps, report 0012). Adds
the capability growth log entry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 19:41:56 +02:00
egutierrez ce72131ddf docs(cluster): correct runbook + wire --internal-id-file into deploy
Corrections learned from the real 0011 deploy:
- Bring up: the "start magnus alone and verify healthz" order deadlocks — a
  lone node of a 3-node cluster has no meta-group quorum and never serves
  healthz until a second node joins. Document a quorum-forming start and that
  a node never self-serves.
- Replication: R1 is an unusable SPOF (all six control-plane buckets on one
  node) and the cold start only converges with the three cold-start fixes;
  go straight to R3 once the cluster forms.
- Add a "user add --store kv" section: the live user-add path that replaces
  stop-seed-restart, with its security model and idempotency/HA/no-delete
  semantics.
- Topology: real IPs, ROUTE_NETWORK=public (no WireGuard mesh exists).
- Chaos test: mark the data-plane client + failover proofs as validated (0012).

Deploy machinery now emits the persisted internal identity: the unit gains
--internal-id-file ${INTERNAL_ID_FILE} and deploy-cluster.sh writes
INTERNAL_ID_FILE into each node's cluster.env, so a fresh deploy enables the
live user-add path on every node.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 19:41:56 +02:00
egutierrez 3aa5a2c9a9 feat(clientcheck): end-to-end client verification (E2E room + failover)
The 0011 chaos test validated only the control plane (healthz + leader
failover + KV readable with 2/3); it never connected an authenticated bus
client to the data plane. cmd/clientcheck is a reusable verification tool: it
connects with a real identity (nkey + TLS on both planes, multi-node seed
lists), creates an ephemeral E2E room (encrypted + signed, no durable stream),
and either publishes N messages and asserts all come back decrypted (golden)
or publishes a counter for a duration while logging the attached node (loop),
so stopping a node mid-run shows the client fail over to a survivor and keep
receiving with quorum 2/3.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 19:41:56 +02:00
egutierrez 02c2004ebd feat(membershipd): user add/list/revoke --store kv against a live cluster
Closes the most valuable 0011 deploy gap: adding users to the running
cluster's replicated allowlist with no stop-seed-restart. Under enforce the
per-subject ACL confines every bus user to its own rooms, so no ordinary
identity may write the control-plane KV buckets; the only identity the
authenticator grants full JetStream permissions is membershipd's internal
service identity.

- main.go: --internal-id-file persists that identity (load-or-create, 0600)
  instead of a fresh ephemeral key, so the same nkey is available out of
  process. Empty keeps the ephemeral default (single-node/dev unchanged).
- users_kv.go: connectKVStore loads the persisted identity, presents its
  nkey (recognized as internal -> full perms), opens the KV store and
  writes. Defaults assume an on-node loopback invocation; a remote target
  without --ca is refused (allowlist must not travel cleartext, audit N6).
  Prints KV_UNIBUS_users replication (followers_current) after a write.
- users_cli.go: --store kv on add/list/revoke. Re-adding a key is an explicit
  ErrUserExists (no silent overwrite / role flip); revoke is a status flip.
- pkg/client: LoadIdentity (load-only) extracted from LoadOrCreateIdentity,
  preserving its "corrupt file is an error, not silently regenerated" guard.
- kv_useradd_test.go: golden write under enforce, idempotency, unreachable
  endpoint, and remote-without-CA refusal against an embedded node.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 19:41:38 +02:00
egutierrez ff580ac031 Merge quick/cluster-coldstart-fixes: 3-node cluster cold-start fixes + real topology 2026-06-07 18:56:28 +02:00
egutierrez 9fbff79df4 chore(deploy): fill cluster nodes.env with the real 3-node topology
Set magnus's public IP (135.125.201.30) and switch ROUTE_NETWORK to "public":
the three nodes have no WireGuard mesh (homer/datardos do not even have wg
installed), so server-to-server routes go over the public IPs, still protected
by the separate cluster route CA (mutual TLS). KV_REPLICAS is raised to 3 now
that the cluster runs at R3.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 18:56:28 +02:00
egutierrez 33746d9962 fix(cluster): make the JetStream control-plane survive a cold multi-node start
Bringing up the 3-node cluster from clean stores never converged: every node
looped on `open KV bucket "UNIBUS_rooms" (replicas=1): context deadline exceeded`.
Three independent defects in the clustered bootstrap path, none of which surface
on a single node (where JetStream is ready instantly), caused it:

1. embeddednats: route connection pooling (nats-server 2.10 default pool of 3)
   churned with "duplicate route"/"client closed" reconnects on the small cluster,
   interrupting the meta-group RAFT heartbeats and forcing perpetual leader
   re-elections. Set Cluster.PoolSize = -1 (single route per peer).

2. embeddednats: the cluster nodes are Docker hosts, so NATS advertised the docker
   bridge IPs (172.x / 10.0.x) to peers, which then tried to dial those private,
   mutually-unreachable addresses. Set Cluster.NoAdvertise = true so only the
   explicit public-IP routes are used. Also added a UNIBUS_NATS_DEBUG env toggle
   (off by default) that enables the embedded server's logger and loopback
   monitoring port for debugging the route/meta layer.

3. membership.OpenJetStream: a KV op is a NATS request/reply; on a cold cluster the
   op was published once, before the node had contact with the meta leader, so the
   request was dropped and the single long-context call just blocked until timeout.
   Retry each bucket op with short per-attempt contexts until it succeeds or an
   overall bootstrap budget (120s) is exhausted, so it lands once the meta settles.

With these the cluster forms cleanly, creates the KV buckets, scales R1->R3 in
place, and survives loss of one node (quorum 2/3). Verified on magnus+homer+datardos.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 18:56:28 +02:00
agent caf005f04b feat(web): frontend v1 — login (handle+contraseña), sidebar rooms+buscador, chat estilo Element
SPA React 19 + Vite + Mantine v9 en modo oscuro (acento índigo), datos mock para
iterar el diseño antes de cablear el gateway. Login con identidad + contraseña
(la contraseña desbloqueará la identidad Ed25519 cifrada en el dispositivo).
Sidebar: avatar de usuario, buscador (rooms/usuarios/mensajes) y lista de rooms
con candado E2E / hash cleartext / badges de no leídos. Panel de chat estilo
Element (avatar+nombre+hora+texto) con composer interactivo.
2026-06-07 17:57:50 +02:00
agent 9787c218ac chore: remove experimental frontends (web, android, playground, mobile)
Limpieza de los frontends de prueba (SPA React, app Kotlin, gateway playground,
binding gomobile) tras la fase de exploración. El bus (cmd/membershipd + pkg/*)
queda intacto y verde. Empezamos un frontend web nuevo desde cero, construido
de forma incremental. Todo lo borrado permanece en el historial git por si hay
que recuperar algo.
2026-06-07 17:38:07 +02:00
egutierrez 926b8e96af chore(0006): bump unibus to 0.8.0, close issue 0006 (cluster hardening + wiring)
All seven phases (0006a–0006g) merged: blockers N3 (replicated nonce) and N2
($JS.API.> KV leak) closed, decentralized KV store wired (--store kv), homogeneous
cluster posture enforced (N1), RefreshSession in all clients (N4), the lows
(secret out of argv, migrate guard, R1/CA docs), and the 3-node deploy material.

Full suite + every audit-0008 attack regression green; govulncheck 0 reachable.
See report 0009.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 17:33:03 +02:00
egutierrez ae39e35fb4 Merge issue/0006g-deploy: cluster deploy material (magnus+homer+datardos, R3 HA) 2026-06-07 17:31:13 +02:00
egutierrez 48a3d6be33 docs(0006g): cluster deploy material for magnus+homer+datardos (R3 HA)
Parameterized, NO-VPS-touched material to bring up unibus as a 3-node cluster.
The authoring agent ran none of it on a host; every remote-changing step is
marked HUMAN and deploy-cluster.sh defaults to a dry run.

deploy/cluster/:
- nodes.env — topology (cluster name, ports, per-node rows). Public IPs known
  (homer 141.94.69.66, datardos 51.91.100.142) pre-filled; magnus public IP and
  all WireGuard IPs are <PLACEHOLDER> for the human; scripts refuse to run while
  any remain.
- generate-cluster-certs.sh — mints a SEPARATE cluster route CA + a route cert per
  node (server+clientAuth, mutual routes) and a data-plane server cert per node
  signed by the reused client CA (../tls/ca.*); SAN = public + WG + hostname.
- membershipd-cluster.service — one unit, parameterized per node via
  /opt/unibus/cluster.env: enforce + per-subject ACL + TLS + --store kv,
  --cluster-pass-file (secret out of argv), Restart=always.
- deploy-cluster.sh — cross-build linux/amd64, generate each node's cluster.env
  (routes to the other two on the WG mesh, no userinfo), rsync + install (only
  with --yes); staggered start is manual.
- README.md — runbook: prerequisites, loopback bootstrap to seed the first admin
  into the KV (works around the user-CLI/KV chicken-and-egg), staggered bring-up,
  verify posture+quorum, scale R1->R3 in place, and the chaos test (left to 0003f
  on the real VPS).
- .gitignore — out/, build/, secrets/, *.key never committed.

bash -n passes on both scripts; go build/test unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 17:31:13 +02:00
egutierrez 24ff45ca7e Merge issue/0006f-lows: cluster secret out of argv + migrate guard + docs (audit 0008 lows) 2026-06-07 17:24:46 +02:00
egutierrez b8201a82cd fix(0006f): cluster secret out of argv, migrate-to-kv TLS guard, R1/CA docs (audit 0008 lows)
Low-severity cluster hardening from audit 0008:

- Route secret out of argv (N1-low): --cluster-pass and a nats://user:pass@host in
  --routes are visible in ps/journald. New --cluster-pass-file and the
  UNIBUS_CLUSTER_PASS env var (precedence file > env > flag); the resolved secret
  guards the route layer and is injected into bare --routes entries
  (injectRouteCreds), so peers can be listed as nats://host:6250 with no secret in
  argv. The legacy --cluster-pass stays for dev/compat.
- migrate-to-kv confidentiality (N6): refuse a remote --nats-url without --ca (the
  allowlist would travel cleartext); loopback targets are exempt (isLoopbackURL).
- Docs (N1 route CA, N3 DoS): deploy/README gains a Clustering section — use a
  SEPARATE cluster CA for routes (not the client CA), keep the secret out of argv,
  run migrate-to-kv loopback/TLS only, and R1 is a SPOF of auth (not HA); R3
  quorum is real HA. The generated cert material lives in deploy/cluster/ (0006g).

Tests:
- TestResolveClusterPass (file > env > flag precedence; missing file errors),
- TestInjectRouteCreds (injects only into userinfo-less routes; preserves overrides),
- TestIsLoopbackURL (loopback vs remote vs malformed).

CGO_ENABLED=0 go build/vet/test green; govulncheck 0 reachable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 17:24:46 +02:00
egutierrez 3a33656cac Merge issue/0006e-refresh: RefreshSession in all clients (audit 0008 N4) 2026-06-07 17:21:14 +02:00
egutierrez 2f5b372a80 fix(0006e): call RefreshSession after membership changes in all clients (audit 0008 N4)
A secured bus freezes per-subject permissions at connect time, so a peer that
creates or joins a room after connecting cannot pub/sub on it until it reconnects
(RefreshSession). No client called it, so under enforce+ACL the demos failed
closed — pushing the operator to disable the ACL (a security regression at the
operator's discretion).

Wire the membership-change contract into every client:
- cmd/worker: RefreshSession after CreateRoom, before publishing.
- cmd/chat (simple): RefreshSession after CreateRoom+Join, before Subscribe.
- cmd/chat (encrypted demo): A refreshes after CreateRoom; B refreshes after the
  invite+join, both before pub/sub.
- local_files/bridge (gateway): RefreshSession after CreateRoom+Join, before Subscribe.
- mobile: new Session.RefreshSession wrapper + the contract documented for callers.

Contract (documented on the wrappers): after ANY membership change, call
RefreshSession BEFORE pub/sub on the new room (it drops active subs, so it must
precede Subscribe). On an unsecured/dev bus it is a harmless reconnect.

Test:
- TestClientCreateRoomRefreshPublishFlow: end-to-end under enforce+ACL, a peer
  creates a room, refreshes, invites a second peer who joins+refreshes+subscribes,
  and the publish is received — no manual intervention, the ACL stays on.

CGO_ENABLED=0 go build/vet/test green; govulncheck 0 reachable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 17:21:14 +02:00
egutierrez 32bec75665 Merge issue/0006d-posture: homogeneous cluster posture + /healthz posture (audit 0008 N1) 2026-06-07 17:17:37 +02:00
egutierrez 9b96537aa6 fix(0006d): enforce homogeneous cluster posture + publish posture on /healthz (audit 0008 N1)
A cluster is only as secure as its weakest node: the data plane forwards every
subject between nodes, so one node running without enforced auth lets an
unauthenticated peer Subscribe(">") on it and harvest the traffic forwarded from
the ACL'd nodes.

- validateClusterConfig now takes the auth mode and REFUSES to join a cluster
  unless --bus-auth enforce, regardless of bind (a clustered node is a production
  node; there is no safe dev cluster without auth). This binary therefore cannot
  BE the weak node.
- Server.Posture {enforce,acl,tls,cluster,store} is published on /healthz (non
  secret operational metadata, probe stays unauthenticated) so a monitor or peer
  can detect a cluster member not running enforce+ACL+TLS — covering a peer that
  runs a tampered/old binary outside this node's control.

Tests:
- TestAttack0008_N1: a clustered node with --bus-auth off is refused; the same
  node with enforce + full route security is allowed.
- TestClusterConfigPolicy: extended with off/soft clustered cases (refused) and
  the mode parameter throughout.
- TestHealthExposesPosture: /healthz returns the posture booleans + store backend.

CGO_ENABLED=0 go build/vet/test green; govulncheck 0 reachable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 17:17:37 +02:00
egutierrez 18ee7c469b Merge issue/0006c-kv-store: wire decentralized control-plane KV store (--store kv) 2026-06-07 17:14:20 +02:00
egutierrez e9ad719424 feat(0006c): wire the decentralized control-plane KV store (--store kv)
0003 built the JetStream KV store (jetstreamStore) but the binary never selected
it: membership.Open (SQLite) was hardcoded and OpenJetStream was only reached by
migrate-to-kv. This completes the wiring so a node actually serves its control
plane from the replicated KV.

- New flag --store kv|sqlite (default sqlite). kv opens the JetStream KV control
  plane over the privileged internal connection; sqlite is the unchanged baseline
  (branch-by-abstraction: the full suite's SQLite paths are untouched).
- Bootstrap cycle resolved with storeHolder: the authenticator consults the holder
  (fail-closed until set), so it can be built before the KV store exists. The KV
  store opens after NATS is up and is published into the holder. The only client
  that can connect in that window is the internal identity, which bypasses the
  store by key. In SQLite mode the store is set before StartServer, so the window
  does not exist.
- needJS now covers --store kv as well as --cluster-name; the JetStream client is
  shared by the KV store and the replicated nonce bucket.
- feature_flags.json: decentralized wiring documented as complete, realized via
  --store kv (opt-in per deploy; default stays sqlite).

Fail-closed preserved: jetstreamStore.IsAuthorized already denies on any backend
error; the holder denies while unset.

Tests:
- TestStoreHolderFailClosed: empty holder denies; serves after set.
- TestKVStoreBootstrapUnderEnforce: end-to-end decentralized boot — KV-seeded user
  authenticates over nkey under enforce; outsider denied.
- TestKVStoreDecentralizedConsistency: a room/user created on one node's KV store
  is visible to another's (ends the per-node SQLite divergence, audit 0008 N5).

CGO_ENABLED=0 go build/vet/test green; govulncheck 0 reachable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 17:14:20 +02:00
egutierrez d1e1a478f8 Merge issue/0006b-kv-acl: scope JetStream ACL per-room (audit 0008 N2) 2026-06-07 17:08:54 +02:00