Commit Graph

75 Commits

Author SHA1 Message Date
agent 9013ea5e33 feat(0003c): membershipd migrate-to-kv (idempotent SQLite -> JetStream KV)
The one-time data move decentralization needs (issue 0003c): copy the
entire control-plane state from the local SQLite database into the
replicated JetStream KV buckets, with a backup taken first.

pkg/membership:
- Snapshot / SealedKeyRecord: a backend-agnostic dump of the whole
  control plane (rooms with their real epoch, members, every sealed-key
  row across epochs, users with status).
- (*sqliteStore).ExportSnapshot and (*jetstreamStore).ExportSnapshot read
  a full Snapshot from each backend; (*jetstreamStore).importSnapshot
  writes one with raw Puts (preserving epoch/status, not resetting to
  defaults) so the migration is faithful and idempotent (every write is
  an overwrite, so re-running converges).
- MigrateSQLiteToKV orchestrates export -> import; BackupSQLite makes a
  consistent copy via SQLite's VACUUM INTO before any migration.

cmd/membershipd:
- `membershipd migrate-to-kv --db <path> --nats-url <url> [--replicas N]
  [--ca <cert>] [--no-backup]` backs up the SQLite file, connects to the
  cluster's NATS, and migrates. Dispatched on the host like `user`.

Tests (DoD: golden + edge + parity):
- TestMigrateSQLiteToKVParity: seed a representative SQLite (two rooms,
  one rekeyed to epoch 2, members, a revoked user); after migration the
  KV ExportSnapshot equals the SQLite ExportSnapshot.
- TestMigrateSQLiteToKVIdempotent: running the migration twice yields the
  same KV state.
- TestBackupSQLiteCreatesConsistentCopy: the backup reopens with
  identical data.
Plus a binary smoke (seed user -> run server -> migrate-to-kv -> re-run):
backup written, 1 user migrated, second run identical.
2026-06-07 15:09:56 +02:00
egutierrez b8c9b2b652 Merge issue/0003b-jetstream-store: Store interface + JetStream KV backend (fail-closed) 2026-06-07 15:04:52 +02:00
agent 6b3ace1d39 feat(0003b): membership.Store interface + JetStream KV implementation
Branch-by-abstraction for the control-plane store (issue 0003b), so the
membership state can move off process-local SQLite onto replicated
JetStream KV without rewriting callers and without breaking master.

pkg/membership:
- Store is now an interface (rooms/members/keys + user allowlist +
  Close). The existing SQLite implementation is renamed sqliteStore and
  stays the default: Open(path) still returns it. openSQLite keeps the
  concrete type for internal callers (the 0003c migration).
- ErrNotFound is a storage-agnostic "no such record" sentinel; both
  backends return it (the SQLite store maps sql.ErrNoRows to it). The
  control plane now branches on ErrNotFound instead of sql.ErrNoRows, so
  server.go no longer imports database/sql.
- jetstreamStore (new) implements Store over five replicated KV buckets:
  rooms, members, rooms_by_member (reverse index for ListRoomsForEndpoint),
  room_keys, users. Replication factor is configurable (R1..R5) for the
  R1->R3 rollout. Every read is bounded by OpTimeout and IsAuthorized /
  HasAdmin FAIL CLOSED on any backend error (a KV quorum loss denies,
  never admits), per the audit's requirement for the decentralized store.

dev/feature_flags.json:
- Add the `decentralized` flag (OFF): sqliteStore default while off,
  jetstreamStore behind it. The membershipd boot wiring that selects the
  KV store is deliberately deferred to 0003e/0003f (the embedded-NATS
  authenticator<->store bootstrap is part of the session/deploy redesign);
  OFF keeps the single-node SQLite control plane unchanged.

Tests (DoD: golden + edges + error path):
- TestJetStreamStoreRoomsCRUD: encrypted room + owner + invited member
  round-trip through every room/member/key method, including latest-epoch
  resolution and rekey.
- TestJetStreamStoreUsers: add/get/authorize/list/revoke + admin gate,
  with case-insensitive key normalization and duplicate rejection.
- TestJetStreamStoreNotFound: ErrNotFound mapping for misses.
- TestJetStreamStoreIsAuthorizedFailClosed: NATS backend shut down ->
  IsAuthorized and HasAdmin both DENY within the bounded timeout.

The full existing suite stays green: sqliteStore is unchanged behavior.
2026-06-07 15:04:52 +02:00
egutierrez 3230b31ade Merge issue/0003a-cluster: NATS cluster routes (auth + mutual TLS) 2026-06-07 14:54:53 +02:00
agent c90f145a05 feat(0003a): NATS cluster routes with shared-secret auth + mutual route TLS
Add high-availability cluster support to the embedded NATS server
(issue 0003a, first phase of decentralization).

pkg/embeddednats:
- ServerConfig gains ServerName (unique per node, required by JetStream
  RAFT) and an optional *ClusterConfig (cluster name, route host/port,
  peer route URLs, shared-secret Username/Password, and a mutual-TLS
  *tls.Config). applyClusterOpts maps it onto server.Options.Cluster +
  Routes. Nil Cluster keeps the legacy standalone server.

pkg/busauth:
- RouteTLSConfig builds the route layer's mutual-TLS config: the node
  presents its CA-signed certificate AND verifies the peer's certificate
  against the bus CA (RequireAndVerifyClientCert), reusing the issue-0001
  CA. Routes authenticate NODES, never the client nkey authenticator.

cmd/membershipd:
- Cluster flags (--cluster-name/--server-name/--cluster-port/--routes/
  --cluster-user/--cluster-pass/--route-tls-cert/-key/-ca) wire a node
  into the cluster. validateClusterConfig refuses a public cluster
  without a route secret and complete mutual route TLS, and rejects
  partial route-TLS flags (all-or-nothing). splitRoutes parses the CSV.

Tests (DoD: golden + 2 edge + error path):
- TestClusterForwardsAcrossNodes: 2-node cluster forwards a client
  subject from one node to a subscriber on the other.
- TestClusterThreeNodesForward: 3-node (HA shape) cross-node forwarding.
- TestClusterMutualTLSForwards: forwarding over mutual-TLS routes.
- TestClusterRejectsBadRouteAuth: wrong cluster password -> no route.
- TestClusterRejectsUnsignedNode: cert not signed by the bus CA -> no route.
- TestClusterConfigPolicy / TestSplitRoutes: boot-guard + CSV parsing.

Master stays green: standalone (no --cluster-name) is unchanged.
2026-06-07 14:54:53 +02:00
egutierrez 618f6b61da chore(0004): close issue, bump unibus to 0.5.0, record dataplane-acl decision
Issue 0004 (security hardening) done across 0004a-0004f. app.md version 0.5.0
with the capability growth log entry; dev/0004d-dataplane-acl.md documents the
chosen minimum-defense strategy for the NATS data plane and its residual limit
(per-subject ACL deferred to 0003). Full work report in
projects/message_bus/reports/0005-2026-06-07-unibus-security-hardening.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 14:40:39 +02:00
egutierrez d483c90356 Merge issue/0004f-medium-fixes: owner binding, nonce-cache pre-auth, error leak (H6/H7/H12)
Owner of a created room must be the signer; the replay cache is populated only
after authorization (with bounded, O(expired) pruning); internal errors no
longer leak to clients.
2026-06-07 14:36:22 +02:00
egutierrez 1bcca987a4 test(membership): regression for H6 owner spoof and H7 nonce-cache poison
TestAudit_OwnerSpoof: a body declaring a foreign owner endpoint or signing key
is 403; a self-owned create is 201.
TestAudit_NonceCachePoisonPreAuth: an unregistered identity's repeated nonce
still fails 'not authorized' (never 'replayed'), proving it was not cached, while
an authorized identity's replay is still rejected.
Nonce cache unit tests: prune-after-TTL and cap-bounded memory.
2026-06-07 14:36:22 +02:00
egutierrez 0aa2caae43 feat(membership): owner binding, pre-auth nonce-cache fix, generic errors
Three medium audit findings.

H6 (owner spoof): handleCreateRoom now binds the body's declared owner to the
authenticated signer — both the endpoint id and the signing key must be the
signer's — so a registered peer cannot create rooms in another identity's name.
Enforced only when an authenticated signer is present.

H7 (nonce-cache poison pre-auth): IsAuthorized now runs BEFORE the replay cache
is touched, so an unregistered identity (Ed25519 keys are free) can no longer
seed nonces into it. The cache is rewritten with O(expired) pruning (insertion
order equals expiry order under a constant TTL) instead of the previous O(n)
full-map scan under the mutex, plus a size cap with oldest-eviction. This is the
prerequisite the 0003 replicated nonce store builds on.

H12 (error leak): internal store/blob errors are logged and replaced with a
generic client message via writeServerErr, so SQL fragments and filesystem paths
no longer reach the caller. Crafted 4xx messages (owner-sig, validation) are kept.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 14:36:22 +02:00
egutierrez 957b728160 Merge issue/0004e-control-tls: TLS on the HTTP control plane (H5)
membershipd serves https with the bus CA cert; the client pins the CA and
refuses a plaintext control plane when a CA is provided.
2026-06-07 14:30:15 +02:00
egutierrez 07f4af817e feat(client,membershipd): TLS on the HTTP control plane (H5)
Audit H5 (Alto, public). The control plane was signed but plaintext, so a
network MITM could read all metadata (subjects, endpoints, public keys, sealed
keys, blob hashes, the social graph) and drop requests. Signing gives integrity,
not confidentiality.

- membershipd serves the control plane over TLS (ListenAndServeTLS, MinVersion
  1.2) with the same CA-signed cert as the data plane when --tls-cert is set; the
  fail-open guard already requires --bus-auth enforce alongside it.
- The client gets a separate Options.CtrlTLS so the HTTP client pins the bus CA,
  independent of the NATS data-plane TLS. Connect now sets both planes' TLS from
  the one CA and REFUSES a plaintext http:// control-plane URL when a CA is
  provided, so metadata is never sent in the clear when TLS is expected.

Connect's signature is unchanged; callers (worker/chat --ca, mobile NewSession)
must pass an https:// control-plane URL when they pass a CA. Documented for the
deploy step.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 14:30:15 +02:00
egutierrez 0d56c3c81d Merge issue/0004d-dataplane-acl: data-plane content confidentiality (H4)
Public deployments refuse cleartext rooms, forcing E2E so a data-plane sniffer
gets only ciphertext. Per-subject ACL deferred to 0003 (documented).
2026-06-07 14:26:45 +02:00
egutierrez fb6c796059 test: regression for H4 data-plane content confidentiality
pkg/membership TestRequireEncryptedRoomsRejectsCleartext: cleartext create ->
403, encrypted -> 201, flag off -> cleartext allowed again.

pkg/client TestAudit_NoSubjectACL: under the public posture a ModeNATS room is
refused; bob (member) decrypts the secret; eve raw-subscribes to the subject off
the data plane and receives only ciphertext (non-empty AEAD nonce, no plaintext
substring) — closing the auditor's 'eve reads internal: salary numbers'.
2026-06-07 14:26:45 +02:00
egutierrez e502b16675 feat(membership): forbid cleartext rooms on public deployments (H4 min defense)
Audit H4 (Alto). The embedded NATS has a single account with no per-subject
permissions, so any registered peer can subscribe to any subject — a cleartext
(ModeNATS) room's payload is readable by anyone who knows the subject.

A complete per-subject ACL derived from membership does not fit here: NATS
evaluates a connection's permissions once at connect time and never re-evaluates
them, but unibus clients connect-then-create/join-then-publish on one connection
(TestSecureBusEndToEnd). Static permissions would forbid the owner from
publishing to a room it just created; the dynamic reconnection model belongs to
the 0003 decentralization redesign. See dev/0004d-dataplane-acl.md.

Minimum defense implemented: Server.RequireEncryptedRooms (set by membershipd on
any non-loopback bind) refuses to create cleartext rooms, so every room on a
public deployment is end-to-end encrypted. Message CONTENT stays confidential
even with no subject isolation; residual traffic-metadata exposure is documented
and tracked for 0003.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 14:26:45 +02:00
egutierrez 47ff74d837 Merge issue/0004c-membership-authz: membership-based authorization (H3)
Room metadata, member lists, room directories and sealed keys are now served
only to members of the room (and a sealed key only to its own endpoint),
closing the horizontal metadata leak.
2026-06-07 14:21:55 +02:00
egutierrez b81e5f26f1 feat(membership): authorize room reads by membership, not registration
Audit H3 (Alto). 'Authorized' meant 'registered in the allowlist', not 'member
of the room', so any registered peer could read another room's subject, its
full member list (every member's sign_pub + kex_pub), any endpoint's room
directory, and even another member's sealed key.

The middleware now carries the authenticated signer's endpoint id into the
handler via request context. Room handlers enforce membership:
  - GET /rooms/{id} and /rooms/{id}/members require the signer to be a member;
  - GET /rooms/{id}/key serves the sealed key only to its own endpoint
    (endpoint == signer) and only to a member;
  - GET /members/{endpoint}/rooms is restricted to the signer's own endpoint.

Authorization is skipped only when no authenticated signer is present (AuthOff
dev, or a soft-mode pass-through), preserving legacy/dev behavior. Internal
errors no longer echo store messages to the client on these paths.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 14:21:55 +02:00
egutierrez d742f91881 Merge issue/0004b-failopen-guard: close the fail-open startup (H2)
A non-loopback bind, or any TLS flag, now requires --bus-auth enforce or the
service refuses to start.
2026-06-07 14:17:37 +02:00
egutierrez 30577145ce feat(membershipd): refuse fail-open startup configs
Audit H2 (Alto). The binary defaulted to --bus-auth off, the NATS nkey
authenticator only turned on under enforce, and TLS was an independent flag.
Booting --bind 0.0.0.0 --tls-cert … without --bus-auth enforce left both
planes open while looking secure.

validateBootConfig is a pure guard, called right after flag parsing, that
log.Fatals on two insecure shapes:
  - a non-loopback --bind without --bus-auth enforce, and
  - --tls-cert/--tls-key without --bus-auth enforce.

An insecure public startup is now impossible (the process exits), so a
fail-open data plane never comes up for an unregistered client to reach.
TestAudit_FailOpenTLSWithoutAuth plus a full policy table cover golden
(public+enforce, dev loopback) and every refused shape.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 14:17:37 +02:00
egutierrez 01e2ee1aa0 Merge issue/0004a-dos-limit: pre-auth DoS hardening (H1)
Body-size ceilings (MaxBytesReader, per-route + Content-Length pre-reject),
Server.MaxHeaderBytes, and a per-IP token-bucket rate limit shut the critical
pre-auth memory-exhaustion vector. Regression test asserts a bounded RSS.
2026-06-07 14:16:13 +02:00
egutierrez e7bdcc978c test(membership): regression for H1 pre-auth DoS body limit
Ports the auditor's TestAudit_DoSBodyLimitNoAuth: an unsigned oversized POST
to /blobs is now rejected 413 without the resident set spiking (measured via
/proc/self/status, delta bounded to <96 MiB vs the attack's 400 MB+). Covers
both a truthful over-ceiling Content-Length (rejected pre-read) and a chunked
unknown-length sender (MaxBytesReader caps the read). Plus golden (normal blob
stored), boundary (exactly at the ceiling accepted), the 1 MiB control-plane
ceiling, and the per-IP rate limit (flood -> 429, distinct IPs not throttled).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 14:16:04 +02:00
egutierrez 60d6a86655 feat(membership): bound request bodies and add per-IP rate limit
Pre-auth DoS hardening (audit H1, Critical). The control-plane middleware
read the request body with io.ReadAll before authenticating and with no size
cap, so an unauthenticated peer could force the server to buffer an arbitrary
body in RAM (the auditor sent 400 MB and watched RSS climb to ~898 MB).

- ServeHTTP now caps the buffered body before reading: a per-route ceiling
  (1 MiB JSON, 16 MiB /blobs) rejects an over-declared Content-Length outright
  and wraps the body in http.MaxBytesReader so a lying/chunked sender trips at
  the ceiling instead of unbounded.
- handlePutBlob maps the MaxBytesReader cutoff to 413 in every auth mode.
- Per-IP token-bucket rate limiter (golang.org/x/time/rate, already in the
  module graph) sheds floods before auth or body reads. Loopback dev stacks are
  unaffected (burst >> any single client's rate). Kept in-package as transport
  glue, not promoted to the registry, mirroring the nonceCache decision in 0003.
- membershipd sets http.Server.MaxHeaderBytes and ReadHeaderTimeout.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 14:16:04 +02:00
agent bcd02716d5 docs(issues): encolar 0002 (media v2), 0003 (descentralización HA), 0004 (hardening seguridad)
Specs de los tres issues siguientes del bus, derivados de esta sesión:
- 0002 media v2: chunking, mimetype, GC del object store, exponer en clientes.
- 0003 descentralización/HA: cluster NATS magnus+homer (R1→R3), control plane
  SQLite→JetStream KV, quorum, failover. Tercer nodo = homer (141.94.69.66).
- 0004 hardening: cierra los hallazgos de la auditoría red-team (report 0004):
  DoS pre-auth, fail-open, autorización por pertenencia, ACL NATS, TLS control plane.
2026-06-07 14:04:33 +02:00
egutierrez 484a07d6fd Merge issue/0001e-migrate-clients: secure-by-default clients, bus-auth enforce
Phase 0001e of issue 0001. client.Connect(caPath) is the single seam every
peer uses: with the bundled ca.crt it connects with TLS + nkey and signs the
control plane (enforce); without it, legacy plaintext dev. worker/chat gain
--ca, the mobile NewSession gains caPath, membershipd gains --tls-cert/--tls-key
and turns on the nkey authenticator under enforce. dev/feature_flags.json
declares the target state (bus-auth enforce, bus-tls on); the gateway and
unibots migrations are documented as notes (dev/0001e-remaining-clients.md).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:49:42 +02:00
egutierrez 04e27518af test(client): secure bus end-to-end (auth + TLS + E2E together)
TestSecureBusEndToEnd boots the server with control-plane enforce, NATS nkey
auth, and TLS all on; two registered peers connect with nkey+TLS, A creates a
Matrix room, invites B, publishes, and B decrypts — proving the three layers
compose. This is the headline golden of issue 0001.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:49:19 +02:00
egutierrez 6b0916f1fa docs(0001e): note remaining gateway and unibots migration
The web gateway (playground) and unibots (in the agents repo) are not migrated
here: the gateway stays a local dev tool at AuthOff, and the bot transport
lives outside this sub-repo. dev/0001e-remaining-clients.md records exactly
what each needs (client.Connect with ca.crt, identity registered in the
allowlist) and the operator server flags for phase 0001f.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:49:19 +02:00
egutierrez 87dbc421cd chore(flags): flip bus-auth to enforce and bus-tls on (target state)
Declares the project's target rollout: bus-auth enforce, bus-tls enabled.
Flags are declarative; the operator activates them at deploy via membershipd
--bus-auth/--tls-cert/--tls-key. CLI defaults stay off so dev and tests run
unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:49:19 +02:00
egutierrez b647779521 feat(membershipd): enable NATS nkey auth (enforce) and TLS flags
Opens the store before NATS so the authenticator can consult IsAuthorized.
Under --bus-auth enforce the embedded NATS gets the nkey authenticator (only
allowlisted identities connect); --tls-cert/--tls-key make it present the
server certificate and require TLS. External NATS manages its own auth/TLS.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:49:19 +02:00
egutierrez 74c8d4f941 feat(client,cmd,mobile): connect securely via client.Connect(caPath)
client.Connect is the single migration seam: a non-empty caPath connects with
TLS pinned to the bus CA plus nkey auth (matching enforce + bus-tls), an empty
caPath keeps the legacy plaintext dev connection; control-plane requests are
signed either way. worker and chat gain a --ca flag; the gomobile NewSession
gains a caPath parameter so the Android app bundles ca.crt and connects
securely. Every peer now flows through one code path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:49:19 +02:00
egutierrez 2ccd11b68c Merge issue/0001d-tls: TLS on the NATS data plane (self-signed CA)
Phase 0001d of issue 0001. embeddednats grows a ServerConfig with an optional
TLS config; the client can pin the bus's self-signed CA via Options.TLS built
from busauth.LoadCATLSConfig. deploy/tls/generate-certs.sh mints the CA and a
server cert (SAN: public IP, WG IP, om, localhost) — only the public ca.crt is
versioned, private keys are gitignored. A client trusting the CA completes the
handshake; one without it fails. TLS stays off until phase 0001e wires it in.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:44:22 +02:00
egutierrez 75939a192c test: TLS data plane end to end + CA/keypair loaders
client/tls_test: mints a throwaway CA + server cert in-memory; a client
pinning the CA completes the handshake and operates (golden), a client
without the CA fails the handshake (error path). busauth/tls_test: golden
load of a CA PEM and a server keypair, plus error paths (missing file,
non-PEM). Harness body extracted to bootHarness(ctrlMode, natsAuth, natsTLS).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:44:13 +02:00
egutierrez 1b56f14c20 feat(deploy/tls): self-signed CA + server cert generator
generate-certs.sh mints the bus CA and a NATS server certificate whose SANs
cover the public IP (135.125.201.30), the WireGuard IP (10.42.0.1), the om
hostname, and localhost/127.0.0.1 for on-host smoke tests (all overridable via
env). Only the public ca.crt is committed; ca.key, server.key and server.crt
are gitignored and distributed out of band. README documents generation, use
and rotation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:44:13 +02:00
egutierrez 2786ae2dde feat(busauth,client): pin the bus CA over TLS
busauth.LoadCATLSConfig turns a ca.crt path into a *tls.Config trusting only
that private CA (clients must pin it; the system roots would reject a
self-signed server cert). busauth.ServerTLSConfig loads the server keypair.
client.Options gains TLS; NewWithOptions calls nats.Secure when set, so the
data-plane connection is encrypted and the server pinned.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:44:13 +02:00
egutierrez 6d3d6d2562 refactor(embeddednats): ServerConfig with optional TLS
Collapses Start/StartHost/StartHostAuth onto StartServer(ServerConfig) so
auth and a TLS config can be set without growing the parameter list further.
When TLS is set the server presents the certificate and requires TLS on the
data plane; the wrappers preserve the existing no-auth/no-TLS behavior.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:44:13 +02:00
egutierrez 217daae472 Merge issue/0001c-nats-nkey: NATS nkey authentication
Phase 0001c of issue 0001. The data plane now authenticates with each peer's
Ed25519 identity reused as a NATS nkey: busauth converts the identity to an
nkey and back, embeddednats installs a CustomClientAuthentication that verifies
the nkey signature and checks the user allowlist on every connection (live
revocation, no restart), and the client opts into nkey via NewWithOptions. The
embedded server stays open by default so dev stacks and existing tests are
unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:37:59 +02:00
egutierrez 00058ea0af test(client): NATS nkey auth end to end
Harness gains newHarnessFull(ctrlMode, natsAuth) wiring the nkey authenticator
to the user allowlist; NATS auth and HTTP auth are independent so each plane
is tested in isolation. TestNatsNkeyAuth: registered peer connects with nkey
and operates (golden); unregistered peer and no-nkey client refused at connect
(error paths); peer revoked at runtime refused on its next connection without
a restart (edge).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:37:59 +02:00
egutierrez 1630f6f163 feat(client): opt-in nkey NATS connection via NewWithOptions
nats.go refuses to connect with an nkey to a server that does not advertise
nkey auth, so the connection cannot blindly always present one. New keeps the
legacy plain connection; NewWithOptions(Options{UseNkey:true}) presents the
peer's identity-derived nkey. NewWithOptions is the single place the data-plane
connection is built, so every peer gets identical behavior from the same
Options (TLS fields arrive in phase 0001d).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:37:59 +02:00
egutierrez b09bafe242 feat(embeddednats): nkey CustomClientAuthentication against the allowlist
busauth.NewNkeyAuthenticator verifies a client's nkey signature over the
server nonce (decoding like nats-server: raw-url then std base64), maps the
nkey to its Ed25519 hex, and consults an injected IsAuthorized predicate.
Checking on every connection (rather than a static Options.Nkeys map) means
revoking a user denies its next connection with no restart. embeddednats
gains StartHostAuth(auth) and sets AlwaysEnableNonce so the server advertises
the nonce nkey clients need; Start/StartHost stay open (auth=nil) for dev.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:37:46 +02:00
egutierrez 413dd61041 feat(busauth): Ed25519<->NATS nkey conversion with round-trip test
A NATS nkey is an Ed25519 keypair, so the bus reuses each peer's signing
identity for the data plane instead of minting new key material. ClientNkey
derives the user nkey public string and a nonce-signing callback from the
peer's Ed25519 private key (its first 32 bytes are the nkey seed);
SignPubHexFromNkey maps a presented nkey back to the allowlist's hex key;
NkeyPublicFromSignPub is the public-only derivation.

This is NATS-specific transport glue kept in the app, not promoted to the
registry, to avoid pulling nats-io/nkeys into the multi-domain registry
module. The dedicated round-trip test runs first (spec requirement): it
proves the nkey signature equals the identity's raw Ed25519 signature and
that the nkey maps back to the identity's hex.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:37:46 +02:00
egutierrez 89e0d0e64a Merge issue/0001b-ctrl-auth: signed control-plane auth + anti-replay
Phase 0001b of issue 0001. The control plane (membershipd HTTP) now supports
the bus-auth rollout off->soft->enforce: clients sign every request with their
Ed25519 identity (headers over method/path/ts/nonce/sha256(body)); the server
verifies the signature, clock skew (+/-30s), nonce replay (60s TTL cache), and
the user allowlist. Revocation denies access on the next request without a
restart. Default stays off so master keeps working.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:32:06 +02:00
egutierrez 2130eaa44d test: control-plane auth middleware + end-to-end enforce
membership/auth_test: golden (signed+registered accepted), error paths
(unregistered 401, replayed nonce 401, clock skew 401, tampered body 401,
missing headers 401), exemptions (healthz, soft allows, off no-op).
client_test: end-to-end with the real client against an enforce server —
registered peer accepted, unregistered rejected, revoked peer denied without
a server restart.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:31:58 +02:00
egutierrez 567e604fc7 chore(playground): pass AuthOff to NewServer (gateway not yet migrated)
The local dev gateway has not adopted signed requests; tracked for phase
0001e. Keeps it working while the NewServer signature gains the auth mode.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:31:58 +02:00
egutierrez 0f8a38d62b feat(membershipd): --bus-auth flag selects control-plane auth mode
Maps off|soft|enforce to membership.AuthMode and wires it into NewServer.
Defaults to off so existing deployments are unaffected until the operator
opts into the rollout.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:31:58 +02:00
egutierrez e0ef3a27cc feat(client): sign every control-plane request (transport auth headers)
doJSON, putBlob and getBlob now go through newSignedRequest, which attaches
X-Unibus-Pub/Ts/Nonce/Sig signing membership.CanonicalRequest with the peer's
Ed25519 key. GETs are signed too so the server can authenticate the caller
uniformly under enforce. The payload-level owner signature (invite/rekey)
is unchanged and coexists with this transport-level signature.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:31:50 +02:00
egutierrez 3e39e23fe0 feat(membership): signed control-plane auth middleware + anti-replay
Adds the bus-auth rollout (off|soft|enforce) to the control plane. The
middleware verifies an Ed25519 request signature over CanonicalRequest
(method, request-URI, ts, nonce, sha256(body)), checks the timestamp is
within +/-30s, rejects replayed nonces via an in-memory TTL cache (60s), and
requires the signer to be an active user in the allowlist. soft logs
rejections but lets requests through so clients can migrate without an
outage; off is the legacy no-op default. /healthz is exempt so health probes
work before any identity exists. CanonicalRequest is exported as the single
source of truth shared with the client.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:31:50 +02:00
egutierrez e9711bf74b Merge issue/0001a-users: bus user allowlist (store + CLI + migration 002)
Phase 0001a of issue 0001 (bus auth + TLS). Adds the users table, store CRUD
(AddUser/GetUser/ListUsers/RevokeUser/IsAuthorized/HasAdmin), the local
'membershipd user' admin CLI for seeding the first admin, and the bus-auth /
bus-tls feature flags (both off). No behavior change yet: the allowlist is
not consulted until phase 0001b wires the control-plane middleware.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:23:29 +02:00
egutierrez 822982b71b test(membership): cover user store golden/edge/error paths
Golden: add -> get -> IsAuthorized true, admin seeded. Edge: empty role
defaults to member, case-insensitive hex lookup, list ordering, revoke
denies authorization and stamps revoked_at. Error: duplicate key
(ErrUserExists), invalid role, empty sign_pub, unknown user not authorized,
revoke of unknown/already-revoked. Plus users-table migration idempotency.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:23:23 +02:00
egutierrez ddc6cabc24 feat(flags): declare bus-auth and bus-tls feature flags (off)
bus-auth carries the off -> soft -> enforce rollout state; bus-tls is a
boolean. Both start disabled so master keeps compiling and passing tests
while the auth/TLS code lands behind them across phases 0001a-0001e.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:23:23 +02:00
egutierrez 0d7ab22d4a feat(membershipd): add 'user add/list/revoke' local admin CLI
Local administration surface for the user allowlist, dispatched before the
server flag set parses os.Args. It opens the SQLite store directly with no
network or auth: running on the bus host is trusted by design, which is how
the first admin is seeded (breaking the chicken-egg of needing an admin to
add an admin). Validates that sign-pub is a 32-byte Ed25519 key in hex and
tolerates the sign-pub positional appearing before or after --db.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:23:16 +02:00
egutierrez c5387028e0 feat(membership): add 002_users.sql migration and user CRUD store
Bus-level user allowlist (issue 0001a): the authoritative directory of
Ed25519 signing identities permitted to use the bus, independent of room
membership. Migration is additive and mirrored byte-for-byte between the
module-root migrations/ and the embedded pkg/membership/migrations/.

Store adds AddUser/GetUser/ListUsers/RevokeUser/IsAuthorized/HasAdmin.
IsAuthorized is the single fail-closed predicate both the control plane and
the NATS data plane will consult, so revocation is a status flip that denies
access on both without a restart.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:23:11 +02:00
agent 7de05c8591 docs(security): fijar exposición pública en el spec 0001
Decisión del operador: el bus se expone a internet protegido por auth+TLS
(WireGuard pasa a ser una vía más, no la barrera). ufw en om abre 8470/4250;
el server cert lleva SAN con la IP pública 135.125.201.30 + la IP WG 10.42.0.1
+ hostname; los clientes controlados embeben el ca.crt propio (sin Let's Encrypt).
La fase de despliegue 0001f la ejecuta el humano; el agente entrega 0001a-0001e.
2026-06-07 12:15:32 +02:00