Close the last control-plane asymmetry: rooms had a signed HTTP surface
but users were only manageable via the local CLI or direct store access.
Add admin-only HTTP endpoints, symmetric with rooms, executed against the
same privileged store the server already serves (SQLite single-node, the
replicated JetStream KV in cluster) — no new KV connection, no internal
identity, so the admin panel can manage the allowlist by signing as an
admin instead of needing --db / direct KV access.
Endpoints (all behind requireAdmin, on top of the existing
signature+nonce+TLS+enforce middleware):
- GET /users list the full allowlist (incl. revoked)
- POST /users add {sign_pub, handle, role}
- POST /users/{signpub}/revoke revoke (status flip, no hard delete)
requireAdmin is default-deny with no dev relaxation: it allows a request
only when the authenticated signer is confirmed by the store as an active
admin; any other case (no signer, non-admin, revoked, store error) is 403,
fail-closed. The request context now also carries the signer's sign_pub
hex, because the endpoint id is a one-way hash of the key and cannot be
reversed to look the signer up in the allowlist.
Validation/idempotency mirror the CLL: sign_pub must be 64-hex, role must
be admin|member (empty defaults to member), re-adding an existing key is a
409 that leaves the row untouched. The hex check is unified into
membership.ValidateSignPubHex, reused by the CLI and the handlers.
pkg/client gains ListUsers/AddUser/RevokeUser (flat UserInfo type) signed
via doJSON, so the panel plugs in directly.
Tests: non-admin -> 403 on all three endpoints; admin add->list->revoke
roundtrip; validation (400 hex, 400 role, 409 re-add, row untouched); plus
a client test against an embedded membershipd under enforce.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Closes the most valuable 0011 deploy gap: adding users to the running
cluster's replicated allowlist with no stop-seed-restart. Under enforce the
per-subject ACL confines every bus user to its own rooms, so no ordinary
identity may write the control-plane KV buckets; the only identity the
authenticator grants full JetStream permissions is membershipd's internal
service identity.
- main.go: --internal-id-file persists that identity (load-or-create, 0600)
instead of a fresh ephemeral key, so the same nkey is available out of
process. Empty keeps the ephemeral default (single-node/dev unchanged).
- users_kv.go: connectKVStore loads the persisted identity, presents its
nkey (recognized as internal -> full perms), opens the KV store and
writes. Defaults assume an on-node loopback invocation; a remote target
without --ca is refused (allowlist must not travel cleartext, audit N6).
Prints KV_UNIBUS_users replication (followers_current) after a write.
- users_cli.go: --store kv on add/list/revoke. Re-adding a key is an explicit
ErrUserExists (no silent overwrite / role flip); revoke is a status flip.
- pkg/client: LoadIdentity (load-only) extracted from LoadOrCreateIdentity,
preserving its "corrupt file is an error, not silently regenerated" guard.
- kv_useradd_test.go: golden write under enforce, idempotency, unreachable
endpoint, and remote-without-CA refusal against an embedded node.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
client.processFrame verified a frame's signature only when one was present
(`info.Policy.SignMsgs && f.Sig != nil`). In a room whose policy REQUIRES
per-message signatures, an attacker with data-plane access could publish a raw
frame with Sig==nil and a forged Sender, and the receiver accepted it as
authentic because the verification block was skipped (audit N3, report 0006).
On a signed-but-cleartext room any peer that knows the subject could thus
impersonate any sender.
Fix: in a SignMsgs room a missing signature is itself a rejection. processFrame
now drops any frame with Sig==nil before attempting verification:
if info.Policy.SignMsgs {
if f.Sig == nil { return } // signature required but absent: drop
// verify ...
}
Non-signed rooms (ModeNATS) are unaffected: unsigned frames there are still
delivered, so the plain-NATS path is unchanged.
Verification (pkg/client/sig_nil_spoof_test.go, TestReaudit_SigNilSpoof):
- golden: a properly signed frame from a member is delivered.
- error : an unsigned frame with a forged Sender in a SignMsgs room is dropped
(the test fails with "SIG-NIL SPOOF: receiver accepted ..." when the fix is
reverted, confirming it is a real regression guard).
- edge : a non-signed room still delivers an unsigned frame.
- CGO_ENABLED=0 go build ./... && go vet ./... && go test -count=1 ./... green.
Refs: report 0006 N3, issue 0005b.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Closes the residual the 0004 hardening deferred: the NATS authenticator
can now confine a registered peer to the subjects of the rooms it
belongs to, instead of letting any registered identity sub/pub on any
subject. The dynamic-membership reconnection model the audit named is
provided by client.RefreshSession.
pkg/busauth:
- verifyNkey factors out the shared nkey verification.
- NewNkeyAuthenticatorACL + PermissionsFunc: an authenticator that, after
authorizing, derives and RegisterUser()s per-subject permissions. A
derivation error denies the connection (fail closed).
pkg/membership:
- SubjectACLFor(store) maps a signing pubkey to the subjects it may use:
the subject of every room it belongs to, plus the client infrastructure
subjects (_INBOX.>, $JS.API.> for request/reply and the persisted plane).
pkg/client:
- RefreshSession() rebuilds the data-plane connection so the authenticator
re-derives permissions after a membership change (NATS freezes
permissions at connect time). It retains the seeds/options to reconnect;
active subscriptions are dropped and must be re-made (documented).
Tests (DoD: isolation + refresh):
- TestSubjectACLIsolation: alice (member of room.A) may sub/pub room.A but
is DENIED sub and pub on room.B (permissions violation), and never reads
bob's room.B traffic; bob never receives alice's cross-room publish.
- TestRefreshSessionGainsNewRoom: alice has no permission for room B until
she is added and calls RefreshSession; the reconnect grants the subject
and she then receives room B traffic.
Scope note: the per-subject ACL authenticator is opt-in (NewServer/
membershipd keep the open authenticator by default) and is wired in with
the decentralized boot path; auto-RefreshSession on every membership
change (fully transparent) remains for 0003f. Master behavior unchanged.
The client (issue 0003e, part 1) accepts a LIST of NATS seeds and a LIST
of control-plane URLs so a node loss is transparent.
pkg/client:
- Options.NatsServers: extra NATS seeds beyond the primary. The client
connects to the joined seed list with MaxReconnects(-1) +
RetryOnFailedConnect, so nats.go fails over to a surviving node when the
one a client is attached to dies and rejoins a node that comes back.
- Options.CtrlURLs: extra control-plane endpoints. doJSON/putBlob/getBlob
now try each endpoint in order, falling over on a transport error to the
next (an HTTP response from any node is authoritative — every node
serves the same state under the KV store). newSignedRequest becomes
newSignedRequestTo(base, ...); each failover attempt mints a fresh nonce
(the signature covers method+path+ts+nonce+body, not the host), so a
retried request is never seen as a replay.
- ConnectedServer()/IsConnected(): observability for which node the data
plane is attached to, for ops and failover tests.
- New/Connect/NewWithOptions keep their signatures (a single URL = a
one-element list), so worker/chat/mobile/playground are unchanged.
Test (DoD edge — the issue's "kill node A" case):
- TestClientFailoverAcrossNodes: A seeds two clustered nodes, subscribes,
receives a cross-node message; the node A is attached to is KILLED; A
reconnects to the survivor and still receives messages — session intact.
Branch-by-abstraction for the control-plane store (issue 0003b), so the
membership state can move off process-local SQLite onto replicated
JetStream KV without rewriting callers and without breaking master.
pkg/membership:
- Store is now an interface (rooms/members/keys + user allowlist +
Close). The existing SQLite implementation is renamed sqliteStore and
stays the default: Open(path) still returns it. openSQLite keeps the
concrete type for internal callers (the 0003c migration).
- ErrNotFound is a storage-agnostic "no such record" sentinel; both
backends return it (the SQLite store maps sql.ErrNoRows to it). The
control plane now branches on ErrNotFound instead of sql.ErrNoRows, so
server.go no longer imports database/sql.
- jetstreamStore (new) implements Store over five replicated KV buckets:
rooms, members, rooms_by_member (reverse index for ListRoomsForEndpoint),
room_keys, users. Replication factor is configurable (R1..R5) for the
R1->R3 rollout. Every read is bounded by OpTimeout and IsAuthorized /
HasAdmin FAIL CLOSED on any backend error (a KV quorum loss denies,
never admits), per the audit's requirement for the decentralized store.
dev/feature_flags.json:
- Add the `decentralized` flag (OFF): sqliteStore default while off,
jetstreamStore behind it. The membershipd boot wiring that selects the
KV store is deliberately deferred to 0003e/0003f (the embedded-NATS
authenticator<->store bootstrap is part of the session/deploy redesign);
OFF keeps the single-node SQLite control plane unchanged.
Tests (DoD: golden + edges + error path):
- TestJetStreamStoreRoomsCRUD: encrypted room + owner + invited member
round-trip through every room/member/key method, including latest-epoch
resolution and rekey.
- TestJetStreamStoreUsers: add/get/authorize/list/revoke + admin gate,
with case-insensitive key normalization and duplicate rejection.
- TestJetStreamStoreNotFound: ErrNotFound mapping for misses.
- TestJetStreamStoreIsAuthorizedFailClosed: NATS backend shut down ->
IsAuthorized and HasAdmin both DENY within the bounded timeout.
The full existing suite stays green: sqliteStore is unchanged behavior.
Audit H5 (Alto, public). The control plane was signed but plaintext, so a
network MITM could read all metadata (subjects, endpoints, public keys, sealed
keys, blob hashes, the social graph) and drop requests. Signing gives integrity,
not confidentiality.
- membershipd serves the control plane over TLS (ListenAndServeTLS, MinVersion
1.2) with the same CA-signed cert as the data plane when --tls-cert is set; the
fail-open guard already requires --bus-auth enforce alongside it.
- The client gets a separate Options.CtrlTLS so the HTTP client pins the bus CA,
independent of the NATS data-plane TLS. Connect now sets both planes' TLS from
the one CA and REFUSES a plaintext http:// control-plane URL when a CA is
provided, so metadata is never sent in the clear when TLS is expected.
Connect's signature is unchanged; callers (worker/chat --ca, mobile NewSession)
must pass an https:// control-plane URL when they pass a CA. Documented for the
deploy step.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
pkg/membership TestRequireEncryptedRoomsRejectsCleartext: cleartext create ->
403, encrypted -> 201, flag off -> cleartext allowed again.
pkg/client TestAudit_NoSubjectACL: under the public posture a ModeNATS room is
refused; bob (member) decrypts the secret; eve raw-subscribes to the subject off
the data plane and receives only ciphertext (non-empty AEAD nonce, no plaintext
substring) — closing the auditor's 'eve reads internal: salary numbers'.
TestSecureBusEndToEnd boots the server with control-plane enforce, NATS nkey
auth, and TLS all on; two registered peers connect with nkey+TLS, A creates a
Matrix room, invites B, publishes, and B decrypts — proving the three layers
compose. This is the headline golden of issue 0001.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
client.Connect is the single migration seam: a non-empty caPath connects with
TLS pinned to the bus CA plus nkey auth (matching enforce + bus-tls), an empty
caPath keeps the legacy plaintext dev connection; control-plane requests are
signed either way. worker and chat gain a --ca flag; the gomobile NewSession
gains a caPath parameter so the Android app bundles ca.crt and connects
securely. Every peer now flows through one code path.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
client/tls_test: mints a throwaway CA + server cert in-memory; a client
pinning the CA completes the handshake and operates (golden), a client
without the CA fails the handshake (error path). busauth/tls_test: golden
load of a CA PEM and a server keypair, plus error paths (missing file,
non-PEM). Harness body extracted to bootHarness(ctrlMode, natsAuth, natsTLS).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
busauth.LoadCATLSConfig turns a ca.crt path into a *tls.Config trusting only
that private CA (clients must pin it; the system roots would reject a
self-signed server cert). busauth.ServerTLSConfig loads the server keypair.
client.Options gains TLS; NewWithOptions calls nats.Secure when set, so the
data-plane connection is encrypted and the server pinned.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Harness gains newHarnessFull(ctrlMode, natsAuth) wiring the nkey authenticator
to the user allowlist; NATS auth and HTTP auth are independent so each plane
is tested in isolation. TestNatsNkeyAuth: registered peer connects with nkey
and operates (golden); unregistered peer and no-nkey client refused at connect
(error paths); peer revoked at runtime refused on its next connection without
a restart (edge).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
nats.go refuses to connect with an nkey to a server that does not advertise
nkey auth, so the connection cannot blindly always present one. New keeps the
legacy plain connection; NewWithOptions(Options{UseNkey:true}) presents the
peer's identity-derived nkey. NewWithOptions is the single place the data-plane
connection is built, so every peer gets identical behavior from the same
Options (TLS fields arrive in phase 0001d).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
membership/auth_test: golden (signed+registered accepted), error paths
(unregistered 401, replayed nonce 401, clock skew 401, tampered body 401,
missing headers 401), exemptions (healthz, soft allows, off no-op).
client_test: end-to-end with the real client against an enforce server —
registered peer accepted, unregistered rejected, revoked peer denied without
a server restart.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
doJSON, putBlob and getBlob now go through newSignedRequest, which attaches
X-Unibus-Pub/Ts/Nonce/Sig signing membership.CanonicalRequest with the peer's
Ed25519 key. GETs are signed too so the server can authenticate the caller
uniformly under enforce. The payload-level owner signature (invite/rekey)
is unchanged and coexists with this transport-level signature.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A peer invited to an encrypted room needs to find it: the control plane is
pull-based (no server push of invitations), so add a discovery endpoint that
lists every room an endpoint belongs to, with the room's metadata and the
endpoint's role.
- store.ListRoomsForEndpoint: JOIN members+rooms, ordered by room id, empty
slice (not error) for an endpoint in no rooms.
- membershipd: GET /members/{endpoint}/rooms returns {room_id, subject, epoch,
policy, role}[].
- client.ListMyRooms + RoomRef: a bot polls this to discover and then Join +
Subscribe rooms it was invited to.
Tests: store-level (owner in N rooms, member in one, unknown endpoint → []) and
client-level e2e through the embedded harness (B discovers a room A invited it
to, without prior knowledge of the room id; owner sees role=owner).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Chat bots need replies, threads and reactions. Add two optional, omitempty
envelope fields (ThreadID, ReplyTo) plus a REACT frame type. The fields ride the
cleartext envelope (message-id references, not secret content) and are omitted
when unset, so non-threaded frames are byte-for-byte identical on the wire and
their signatures unchanged — a non-breaking, additive change.
Client gains PublishReply (threaded reply) and React (emoji reaction). The
reaction content travels in the payload, so it is sealed like any message and
stays confidential in E2E rooms; receivers dispatch on Frame.Type == REACT and
read Frame.ReplyTo for the target. Publish is refactored to share one
publishFrame path with the new helpers; its behavior is unchanged.
Tests: frame round-trip of a threaded REACT frame (golden), non-threaded
wire/sig back-compat asserting thr/re keys are absent (edge), Unmarshal of
garbage errors (error path), and an end-to-end reply+reaction round-trip in an
encrypted ModeMatrix room.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- membership server returns 403 + human-readable message on missing sealed key (was leaking 'sql: no rows in result set')
- client doJSON unwraps the server's {"error"} field instead of pasting the raw HTTP envelope