Commit Graph

14 Commits

Author SHA1 Message Date
egutierrez 926b8e96af chore(0006): bump unibus to 0.8.0, close issue 0006 (cluster hardening + wiring)
All seven phases (0006a–0006g) merged: blockers N3 (replicated nonce) and N2
($JS.API.> KV leak) closed, decentralized KV store wired (--store kv), homogeneous
cluster posture enforced (N1), RefreshSession in all clients (N4), the lows
(secret out of argv, migrate guard, R1/CA docs), and the 3-node deploy material.

Full suite + every audit-0008 attack regression green; govulncheck 0 reachable.
See report 0009.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 17:33:03 +02:00
egutierrez e9ad719424 feat(0006c): wire the decentralized control-plane KV store (--store kv)
0003 built the JetStream KV store (jetstreamStore) but the binary never selected
it: membership.Open (SQLite) was hardcoded and OpenJetStream was only reached by
migrate-to-kv. This completes the wiring so a node actually serves its control
plane from the replicated KV.

- New flag --store kv|sqlite (default sqlite). kv opens the JetStream KV control
  plane over the privileged internal connection; sqlite is the unchanged baseline
  (branch-by-abstraction: the full suite's SQLite paths are untouched).
- Bootstrap cycle resolved with storeHolder: the authenticator consults the holder
  (fail-closed until set), so it can be built before the KV store exists. The KV
  store opens after NATS is up and is published into the holder. The only client
  that can connect in that window is the internal identity, which bypasses the
  store by key. In SQLite mode the store is set before StartServer, so the window
  does not exist.
- needJS now covers --store kv as well as --cluster-name; the JetStream client is
  shared by the KV store and the replicated nonce bucket.
- feature_flags.json: decentralized wiring documented as complete, realized via
  --store kv (opt-in per deploy; default stays sqlite).

Fail-closed preserved: jetstreamStore.IsAuthorized already denies on any backend
error; the holder denies while unset.

Tests:
- TestStoreHolderFailClosed: empty holder denies; serves after set.
- TestKVStoreBootstrapUnderEnforce: end-to-end decentralized boot — KV-seeded user
  authenticates over nkey under enforce; outsider denied.
- TestKVStoreDecentralizedConsistency: a room/user created on one node's KV store
  is visible to another's (ends the per-node SQLite divergence, audit 0008 N5).

CGO_ENABLED=0 go build/vet/test green; govulncheck 0 reachable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 17:14:20 +02:00
agent 5df99fa4c4 docs(issue): 0006 completar+endurecer cluster — wiring KV + N1-N6 auditoría 0008 + material deploy magnus/homer/datardos 2026-06-07 16:48:07 +02:00
egutierrez 6976537842 chore(0005): bump unibus to 0.7.0, close issue 0005 (hardening 2)
Hardening 2 (issue 0005, fases 0005a-0005e) cierra los hallazgos nuevos de la
re-auditoría red-team (report 0006): bump de nats-server + toolchain (16 CVEs ->
0 alcanzables), drop de frames sin firma en rooms SignMsgs, limiter global de
bytes en vuelo contra el DoS por concurrencia, TLS obligatorio en bind publico, y
cableado de la ACL por subject que cierra el wildcard metadata leak. Detalle por
fase en el capability growth log del app.md y en el report 0007.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 16:17:41 +02:00
agent fb0291ad8a docs(issue): 0005 hardening 2 — CVEs, sig-nil spoof, DoS concurrencia, TLS forzado (re-auditoría 0006) 2026-06-07 15:48:49 +02:00
agent 6b3ace1d39 feat(0003b): membership.Store interface + JetStream KV implementation
Branch-by-abstraction for the control-plane store (issue 0003b), so the
membership state can move off process-local SQLite onto replicated
JetStream KV without rewriting callers and without breaking master.

pkg/membership:
- Store is now an interface (rooms/members/keys + user allowlist +
  Close). The existing SQLite implementation is renamed sqliteStore and
  stays the default: Open(path) still returns it. openSQLite keeps the
  concrete type for internal callers (the 0003c migration).
- ErrNotFound is a storage-agnostic "no such record" sentinel; both
  backends return it (the SQLite store maps sql.ErrNoRows to it). The
  control plane now branches on ErrNotFound instead of sql.ErrNoRows, so
  server.go no longer imports database/sql.
- jetstreamStore (new) implements Store over five replicated KV buckets:
  rooms, members, rooms_by_member (reverse index for ListRoomsForEndpoint),
  room_keys, users. Replication factor is configurable (R1..R5) for the
  R1->R3 rollout. Every read is bounded by OpTimeout and IsAuthorized /
  HasAdmin FAIL CLOSED on any backend error (a KV quorum loss denies,
  never admits), per the audit's requirement for the decentralized store.

dev/feature_flags.json:
- Add the `decentralized` flag (OFF): sqliteStore default while off,
  jetstreamStore behind it. The membershipd boot wiring that selects the
  KV store is deliberately deferred to 0003e/0003f (the embedded-NATS
  authenticator<->store bootstrap is part of the session/deploy redesign);
  OFF keeps the single-node SQLite control plane unchanged.

Tests (DoD: golden + edges + error path):
- TestJetStreamStoreRoomsCRUD: encrypted room + owner + invited member
  round-trip through every room/member/key method, including latest-epoch
  resolution and rekey.
- TestJetStreamStoreUsers: add/get/authorize/list/revoke + admin gate,
  with case-insensitive key normalization and duplicate rejection.
- TestJetStreamStoreNotFound: ErrNotFound mapping for misses.
- TestJetStreamStoreIsAuthorizedFailClosed: NATS backend shut down ->
  IsAuthorized and HasAdmin both DENY within the bounded timeout.

The full existing suite stays green: sqliteStore is unchanged behavior.
2026-06-07 15:04:52 +02:00
egutierrez 618f6b61da chore(0004): close issue, bump unibus to 0.5.0, record dataplane-acl decision
Issue 0004 (security hardening) done across 0004a-0004f. app.md version 0.5.0
with the capability growth log entry; dev/0004d-dataplane-acl.md documents the
chosen minimum-defense strategy for the NATS data plane and its residual limit
(per-subject ACL deferred to 0003). Full work report in
projects/message_bus/reports/0005-2026-06-07-unibus-security-hardening.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 14:40:39 +02:00
egutierrez e502b16675 feat(membership): forbid cleartext rooms on public deployments (H4 min defense)
Audit H4 (Alto). The embedded NATS has a single account with no per-subject
permissions, so any registered peer can subscribe to any subject — a cleartext
(ModeNATS) room's payload is readable by anyone who knows the subject.

A complete per-subject ACL derived from membership does not fit here: NATS
evaluates a connection's permissions once at connect time and never re-evaluates
them, but unibus clients connect-then-create/join-then-publish on one connection
(TestSecureBusEndToEnd). Static permissions would forbid the owner from
publishing to a room it just created; the dynamic reconnection model belongs to
the 0003 decentralization redesign. See dev/0004d-dataplane-acl.md.

Minimum defense implemented: Server.RequireEncryptedRooms (set by membershipd on
any non-loopback bind) refuses to create cleartext rooms, so every room on a
public deployment is end-to-end encrypted. Message CONTENT stays confidential
even with no subject isolation; residual traffic-metadata exposure is documented
and tracked for 0003.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 14:26:45 +02:00
agent bcd02716d5 docs(issues): encolar 0002 (media v2), 0003 (descentralización HA), 0004 (hardening seguridad)
Specs de los tres issues siguientes del bus, derivados de esta sesión:
- 0002 media v2: chunking, mimetype, GC del object store, exponer en clientes.
- 0003 descentralización/HA: cluster NATS magnus+homer (R1→R3), control plane
  SQLite→JetStream KV, quorum, failover. Tercer nodo = homer (141.94.69.66).
- 0004 hardening: cierra los hallazgos de la auditoría red-team (report 0004):
  DoS pre-auth, fail-open, autorización por pertenencia, ACL NATS, TLS control plane.
2026-06-07 14:04:33 +02:00
egutierrez 6b0916f1fa docs(0001e): note remaining gateway and unibots migration
The web gateway (playground) and unibots (in the agents repo) are not migrated
here: the gateway stays a local dev tool at AuthOff, and the bot transport
lives outside this sub-repo. dev/0001e-remaining-clients.md records exactly
what each needs (client.Connect with ca.crt, identity registered in the
allowlist) and the operator server flags for phase 0001f.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:49:19 +02:00
egutierrez 87dbc421cd chore(flags): flip bus-auth to enforce and bus-tls on (target state)
Declares the project's target rollout: bus-auth enforce, bus-tls enabled.
Flags are declarative; the operator activates them at deploy via membershipd
--bus-auth/--tls-cert/--tls-key. CLI defaults stay off so dev and tests run
unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:49:19 +02:00
egutierrez ddc6cabc24 feat(flags): declare bus-auth and bus-tls feature flags (off)
bus-auth carries the off -> soft -> enforce rollout state; bus-tls is a
boolean. Both start disabled so master keeps compiling and passing tests
while the auth/TLS code lands behind them across phases 0001a-0001e.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:23:23 +02:00
agent 7de05c8591 docs(security): fijar exposición pública en el spec 0001
Decisión del operador: el bus se expone a internet protegido por auth+TLS
(WireGuard pasa a ser una vía más, no la barrera). ufw en om abre 8470/4250;
el server cert lleva SAN con la IP pública 135.125.201.30 + la IP WG 10.42.0.1
+ hostname; los clientes controlados embeben el ca.crt propio (sin Let's Encrypt).
La fase de despliegue 0001f la ejecuta el humano; el agente entrega 0001a-0001e.
2026-06-07 12:15:32 +02:00
agent 9a915839c8 docs(security): spec issue 0001 — sistema de usuarios + auth firmada del control plane + NATS nkey + TLS
Diseño de las tres capas de seguridad del bus para que WireGuard pase a ser
opcional: tabla users (allowlist Ed25519 con roles/revocación), middleware de
firma Ed25519 + anti-replay en el control plane (generaliza signRequest/
verifyOwnerSig ya existentes), y NATS endurecido con CustomClientAuthentication
(nkey sobre la identidad del peer, revocación dinámica) + TLS con CA propia.
Incluye 6 fases TBD con feature flag bus-auth (off->soft->enforce), migración de
clientes (pkg/client centraliza el cambio), plan de despliegue a om y matriz de
tests (golden/edge/error).
2026-06-07 12:12:19 +02:00