Commit Graph

9 Commits

Author SHA1 Message Date
agent 6b3ace1d39 feat(0003b): membership.Store interface + JetStream KV implementation
Branch-by-abstraction for the control-plane store (issue 0003b), so the
membership state can move off process-local SQLite onto replicated
JetStream KV without rewriting callers and without breaking master.

pkg/membership:
- Store is now an interface (rooms/members/keys + user allowlist +
  Close). The existing SQLite implementation is renamed sqliteStore and
  stays the default: Open(path) still returns it. openSQLite keeps the
  concrete type for internal callers (the 0003c migration).
- ErrNotFound is a storage-agnostic "no such record" sentinel; both
  backends return it (the SQLite store maps sql.ErrNoRows to it). The
  control plane now branches on ErrNotFound instead of sql.ErrNoRows, so
  server.go no longer imports database/sql.
- jetstreamStore (new) implements Store over five replicated KV buckets:
  rooms, members, rooms_by_member (reverse index for ListRoomsForEndpoint),
  room_keys, users. Replication factor is configurable (R1..R5) for the
  R1->R3 rollout. Every read is bounded by OpTimeout and IsAuthorized /
  HasAdmin FAIL CLOSED on any backend error (a KV quorum loss denies,
  never admits), per the audit's requirement for the decentralized store.

dev/feature_flags.json:
- Add the `decentralized` flag (OFF): sqliteStore default while off,
  jetstreamStore behind it. The membershipd boot wiring that selects the
  KV store is deliberately deferred to 0003e/0003f (the embedded-NATS
  authenticator<->store bootstrap is part of the session/deploy redesign);
  OFF keeps the single-node SQLite control plane unchanged.

Tests (DoD: golden + edges + error path):
- TestJetStreamStoreRoomsCRUD: encrypted room + owner + invited member
  round-trip through every room/member/key method, including latest-epoch
  resolution and rekey.
- TestJetStreamStoreUsers: add/get/authorize/list/revoke + admin gate,
  with case-insensitive key normalization and duplicate rejection.
- TestJetStreamStoreNotFound: ErrNotFound mapping for misses.
- TestJetStreamStoreIsAuthorizedFailClosed: NATS backend shut down ->
  IsAuthorized and HasAdmin both DENY within the bounded timeout.

The full existing suite stays green: sqliteStore is unchanged behavior.
2026-06-07 15:04:52 +02:00
egutierrez 618f6b61da chore(0004): close issue, bump unibus to 0.5.0, record dataplane-acl decision
Issue 0004 (security hardening) done across 0004a-0004f. app.md version 0.5.0
with the capability growth log entry; dev/0004d-dataplane-acl.md documents the
chosen minimum-defense strategy for the NATS data plane and its residual limit
(per-subject ACL deferred to 0003). Full work report in
projects/message_bus/reports/0005-2026-06-07-unibus-security-hardening.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 14:40:39 +02:00
egutierrez e502b16675 feat(membership): forbid cleartext rooms on public deployments (H4 min defense)
Audit H4 (Alto). The embedded NATS has a single account with no per-subject
permissions, so any registered peer can subscribe to any subject — a cleartext
(ModeNATS) room's payload is readable by anyone who knows the subject.

A complete per-subject ACL derived from membership does not fit here: NATS
evaluates a connection's permissions once at connect time and never re-evaluates
them, but unibus clients connect-then-create/join-then-publish on one connection
(TestSecureBusEndToEnd). Static permissions would forbid the owner from
publishing to a room it just created; the dynamic reconnection model belongs to
the 0003 decentralization redesign. See dev/0004d-dataplane-acl.md.

Minimum defense implemented: Server.RequireEncryptedRooms (set by membershipd on
any non-loopback bind) refuses to create cleartext rooms, so every room on a
public deployment is end-to-end encrypted. Message CONTENT stays confidential
even with no subject isolation; residual traffic-metadata exposure is documented
and tracked for 0003.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 14:26:45 +02:00
agent bcd02716d5 docs(issues): encolar 0002 (media v2), 0003 (descentralización HA), 0004 (hardening seguridad)
Specs de los tres issues siguientes del bus, derivados de esta sesión:
- 0002 media v2: chunking, mimetype, GC del object store, exponer en clientes.
- 0003 descentralización/HA: cluster NATS magnus+homer (R1→R3), control plane
  SQLite→JetStream KV, quorum, failover. Tercer nodo = homer (141.94.69.66).
- 0004 hardening: cierra los hallazgos de la auditoría red-team (report 0004):
  DoS pre-auth, fail-open, autorización por pertenencia, ACL NATS, TLS control plane.
2026-06-07 14:04:33 +02:00
egutierrez 6b0916f1fa docs(0001e): note remaining gateway and unibots migration
The web gateway (playground) and unibots (in the agents repo) are not migrated
here: the gateway stays a local dev tool at AuthOff, and the bot transport
lives outside this sub-repo. dev/0001e-remaining-clients.md records exactly
what each needs (client.Connect with ca.crt, identity registered in the
allowlist) and the operator server flags for phase 0001f.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:49:19 +02:00
egutierrez 87dbc421cd chore(flags): flip bus-auth to enforce and bus-tls on (target state)
Declares the project's target rollout: bus-auth enforce, bus-tls enabled.
Flags are declarative; the operator activates them at deploy via membershipd
--bus-auth/--tls-cert/--tls-key. CLI defaults stay off so dev and tests run
unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:49:19 +02:00
egutierrez ddc6cabc24 feat(flags): declare bus-auth and bus-tls feature flags (off)
bus-auth carries the off -> soft -> enforce rollout state; bus-tls is a
boolean. Both start disabled so master keeps compiling and passing tests
while the auth/TLS code lands behind them across phases 0001a-0001e.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:23:23 +02:00
agent 7de05c8591 docs(security): fijar exposición pública en el spec 0001
Decisión del operador: el bus se expone a internet protegido por auth+TLS
(WireGuard pasa a ser una vía más, no la barrera). ufw en om abre 8470/4250;
el server cert lleva SAN con la IP pública 135.125.201.30 + la IP WG 10.42.0.1
+ hostname; los clientes controlados embeben el ca.crt propio (sin Let's Encrypt).
La fase de despliegue 0001f la ejecuta el humano; el agente entrega 0001a-0001e.
2026-06-07 12:15:32 +02:00
agent 9a915839c8 docs(security): spec issue 0001 — sistema de usuarios + auth firmada del control plane + NATS nkey + TLS
Diseño de las tres capas de seguridad del bus para que WireGuard pase a ser
opcional: tabla users (allowlist Ed25519 con roles/revocación), middleware de
firma Ed25519 + anti-replay en el control plane (generaliza signRequest/
verifyOwnerSig ya existentes), y NATS endurecido con CustomClientAuthentication
(nkey sobre la identidad del peer, revocación dinámica) + TLS con CA propia.
Incluye 6 fases TBD con feature flag bus-auth (off->soft->enforce), migración de
clientes (pkg/client centraliza el cambio), plan de despliegue a om y matriz de
tests (golden/edge/error).
2026-06-07 12:12:19 +02:00