Bringing up the 3-node cluster from clean stores never converged: every node
looped on `open KV bucket "UNIBUS_rooms" (replicas=1): context deadline exceeded`.
Three independent defects in the clustered bootstrap path, none of which surface
on a single node (where JetStream is ready instantly), caused it:
1. embeddednats: route connection pooling (nats-server 2.10 default pool of 3)
churned with "duplicate route"/"client closed" reconnects on the small cluster,
interrupting the meta-group RAFT heartbeats and forcing perpetual leader
re-elections. Set Cluster.PoolSize = -1 (single route per peer).
2. embeddednats: the cluster nodes are Docker hosts, so NATS advertised the docker
bridge IPs (172.x / 10.0.x) to peers, which then tried to dial those private,
mutually-unreachable addresses. Set Cluster.NoAdvertise = true so only the
explicit public-IP routes are used. Also added a UNIBUS_NATS_DEBUG env toggle
(off by default) that enables the embedded server's logger and loopback
monitoring port for debugging the route/meta layer.
3. membership.OpenJetStream: a KV op is a NATS request/reply; on a cold cluster the
op was published once, before the node had contact with the meta leader, so the
request was dropped and the single long-context call just blocked until timeout.
Retry each bucket op with short per-attempt contexts until it succeeds or an
overall bootstrap budget (120s) is exhausted, so it lands once the meta settles.
With these the cluster forms cleanly, creates the KV buckets, scales R1->R3 in
place, and survives loss of one node (quorum 2/3). Verified on magnus+homer+datardos.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A secured bus freezes per-subject permissions at connect time, so a peer that
creates or joins a room after connecting cannot pub/sub on it until it reconnects
(RefreshSession). No client called it, so under enforce+ACL the demos failed
closed — pushing the operator to disable the ACL (a security regression at the
operator's discretion).
Wire the membership-change contract into every client:
- cmd/worker: RefreshSession after CreateRoom, before publishing.
- cmd/chat (simple): RefreshSession after CreateRoom+Join, before Subscribe.
- cmd/chat (encrypted demo): A refreshes after CreateRoom; B refreshes after the
invite+join, both before pub/sub.
- local_files/bridge (gateway): RefreshSession after CreateRoom+Join, before Subscribe.
- mobile: new Session.RefreshSession wrapper + the contract documented for callers.
Contract (documented on the wrappers): after ANY membership change, call
RefreshSession BEFORE pub/sub on the new room (it drops active subs, so it must
precede Subscribe). On an unsecured/dev bus it is a harmless reconnect.
Test:
- TestClientCreateRoomRefreshPublishFlow: end-to-end under enforce+ACL, a peer
creates a room, refreshes, invites a second peer who joins+refreshes+subscribes,
and the publish is received — no manual intervention, the ACL stays on.
CGO_ENABLED=0 go build/vet/test green; govulncheck 0 reachable.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A cluster is only as secure as its weakest node: the data plane forwards every
subject between nodes, so one node running without enforced auth lets an
unauthenticated peer Subscribe(">") on it and harvest the traffic forwarded from
the ACL'd nodes.
- validateClusterConfig now takes the auth mode and REFUSES to join a cluster
unless --bus-auth enforce, regardless of bind (a clustered node is a production
node; there is no safe dev cluster without auth). This binary therefore cannot
BE the weak node.
- Server.Posture {enforce,acl,tls,cluster,store} is published on /healthz (non
secret operational metadata, probe stays unauthenticated) so a monitor or peer
can detect a cluster member not running enforce+ACL+TLS — covering a peer that
runs a tampered/old binary outside this node's control.
Tests:
- TestAttack0008_N1: a clustered node with --bus-auth off is refused; the same
node with enforce + full route security is allowed.
- TestClusterConfigPolicy: extended with off/soft clustered cases (refused) and
the mode parameter throughout.
- TestHealthExposesPosture: /healthz returns the posture booleans + store backend.
CGO_ENABLED=0 go build/vet/test green; govulncheck 0 reachable.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The client-infra grant was {"_INBOX.>", "$JS.API.>"}. The broad "$JS.API.>" let
any registered peer drive the whole JetStream API and read the control-plane KV
buckets (KV_UNIBUS_users/rooms/members/room_keys) and the object store directly
over NATS, bypassing the HTTP authorization (requireMember + own-endpoint
checks): a full leak of the allowlist, room graph and sealed-key metadata once the
decentralized control plane is active.
Fix: replace the broad grant with a CLOSED, per-room allow set.
- clientInfraSubjects shrinks to {"_INBOX.>", "$JS.API.INFO"} ($JS.API.INFO is
account counters only — no room/user/key contents).
- SubjectACLFor now grants, per room the peer belongs to, the room subject plus
the minimal JetStream API subjects of THAT room's stream (jsSubjectsFor:
STREAM.*, CONSUMER.*, $JS.ACK scoped to UNIBUS_<roomID>).
- Because KV_UNIBUS_* and OBJ_UNIBUS_* are never a room stream, they fall outside
the closed allow set and are denied by default. Clients reach blobs over the
HTTP control plane, not the NATS object store, so OBJ needs no client grant.
roomStreamName mirrors pkg/client.streamName so the authorizer and the producer
never drift.
Tests:
- TestAttack0008_N2: eve (registered, member of no room) cannot bind the KV users
bucket nor subscribe $KV.UNIBUS_users.> (permissions violation); golden: the
room owner can still drive her OWN room stream's JetStream API; edge: eve cannot
reach a foreign room's stream.
- TestReaudit_H4 residual note updated: the $JS.API.> leak it deferred is closed.
CGO_ENABLED=0 go build/vet/test green; govulncheck 0 reachable.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The per-subject data-plane ACL existed since 0003e (membership.SubjectACLFor +
busauth.NewNkeyAuthenticatorACL, unit-tested in TestSubjectACLIsolation) but the
binary never used it: cmd/membershipd installed the plain NewNkeyAuthenticator, so
in production a registered NON-member could open a raw NATS connection,
Subscribe(">"), and harvest every room's subject plus JetStream stream/advisory
activity (payload stayed E2E ciphertext, metadata leaked) — the re-audit's H4
vector (report 0006).
Fix:
- New busauth.PermissionsFromSubjects adapts a subject-deriving function into the
PermissionsFunc the ACL authenticator expects (subjects granted as both the
publish and subscribe allow set; a derivation error fails closed). It lives in
busauth so membership stays free of the nats-server dependency.
- cmd/membershipd, under enforce, now installs
NewNkeyAuthenticatorACL(store.IsAuthorized,
PermissionsFromSubjects(membership.SubjectACLFor(store)))
so every connection is confined to the subjects of the rooms it belongs to plus
the client-infra subjects.
- pkg/membership/acl_test.go's helper now delegates to the production wiring
(PermissionsFromSubjects) instead of a test-only reimplementation, so the tests
exercise the real path.
Verification (pkg/membership/acl_test.go):
- TestReaudit_H4_WildcardMetadataLeak: a non-member's Subscribe(">") and any
foreign-subject subscribe raise permission violations; the member still pub/subs
her own room and the non-member captures nothing. With the plain authenticator
(the pre-0005e wiring) the test fails ("wildcard metadata leak still open"),
confirming the wiring is what closes it.
- TestSubjectACLIsolation / TestRefreshSessionGainsNewRoom still green.
- CGO_ENABLED=0 go build ./... && go vet ./... && go test -count=1 ./... green.
Residual (documented): the client-infra grant includes "$JS.API.>", shared by all
peers so per-connection JetStream works; a peer that subscribes specifically to
"$JS.API.>" can still observe stream-management requests whose subjects embed the
room-derived stream name. Fully closing that needs NATS accounts/permissions per
identity (deferred to the 0003 decentralization line). Operational note: NATS
freezes permissions at connect time, so clients must client.RefreshSession after a
membership change to gain a new room's subject; cmd/chat and cmd/worker do not yet
call it, a functional gap to close before an enforce+ACL deployment.
Refs: report 0006 H4, issue 0005e.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The H1 fix bounds each request (1 MiB control / 16 MiB blob) and the per-IP rate
limiter throttles a single source, but neither bounds the AGGREGATE memory across
concurrent requests. The re-audit (report 0006, N2) drove RSS to ~1.42 GB with 40
concurrent 16 MiB uploads, and noted that a multi-IP (botnet) flood scales without
a ceiling because the rate limit is per-IP.
Fix: a global, non-blocking, byte-counting limiter (pkg/membership/inflight.go).
ServeHTTP reserves a POST's worst-case buffered size (its route ceiling) from the
limiter before reading the body, and releases it when the request finishes. When
the global cap (maxInflightBytes = 128 MiB) is reached, further POSTs are shed
with 503 (backpressure) rather than parking goroutines, so total bytes buffered
in flight stays bounded regardless of connection count or source-IP spread. GETs
carry no body and do not consume the budget.
The limiter is implemented inside unibus (not delegated to the fn-registry, where
a generic concurrency primitive would normally live) because functions/core pulls
transitive deps requiring CGO (mattn/go-sqlite3) and external modules that are
incompatible with unibus's CGO_ENABLED=0 build, and because this work is scoped
to the unibus sub-repo. The type/method comments document this.
Verification:
- pkg/membership/inflight_test.go: TestInflightLimiter{Basics,Disabled,Concurrent}
cover golden/edge/error/disabled/over-release and a -race concurrency invariant
(inFlight returns to 0, never exceeds cap).
- pkg/membership/dos_concurrency_test.go: TestReaudit_DoSConcurrency fires 40
concurrent 16 MiB uploads from distinct IPs (the multi-IP shape) against a 48 MiB
test cap -> 200=3 503=37, RSS delta ~93 MiB (bound 256 MiB), inFlight()==0, and a
fresh upload still 200. With the limiter disabled the test fails (200=40 503=0),
confirming it is a real regression guard.
- CGO_ENABLED=0 go build ./... && go vet ./... && go test -count=1 ./... green;
CGO_ENABLED=1 go test -race ./pkg/membership/ green.
Residual (documented): under enforce the body is buffered twice (auth verify +
handler), so real RSS is ~2x the reserved bytes; closing that fully means
streaming blobs to disk (overlaps H9 / issue 0002).
Refs: report 0006 N2, issue 0005c.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Closes the residual the 0004 hardening deferred: the NATS authenticator
can now confine a registered peer to the subjects of the rooms it
belongs to, instead of letting any registered identity sub/pub on any
subject. The dynamic-membership reconnection model the audit named is
provided by client.RefreshSession.
pkg/busauth:
- verifyNkey factors out the shared nkey verification.
- NewNkeyAuthenticatorACL + PermissionsFunc: an authenticator that, after
authorizing, derives and RegisterUser()s per-subject permissions. A
derivation error denies the connection (fail closed).
pkg/membership:
- SubjectACLFor(store) maps a signing pubkey to the subjects it may use:
the subject of every room it belongs to, plus the client infrastructure
subjects (_INBOX.>, $JS.API.> for request/reply and the persisted plane).
pkg/client:
- RefreshSession() rebuilds the data-plane connection so the authenticator
re-derives permissions after a membership change (NATS freezes
permissions at connect time). It retains the seeds/options to reconnect;
active subscriptions are dropped and must be re-made (documented).
Tests (DoD: isolation + refresh):
- TestSubjectACLIsolation: alice (member of room.A) may sub/pub room.A but
is DENIED sub and pub on room.B (permissions violation), and never reads
bob's room.B traffic; bob never receives alice's cross-room publish.
- TestRefreshSessionGainsNewRoom: alice has no permission for room B until
she is added and calls RefreshSession; the reconnect grants the subject
and she then receives room B traffic.
Scope note: the per-subject ACL authenticator is opt-in (NewServer/
membershipd keep the open authenticator by default) and is wired in with
the decentralized boot path; auto-RefreshSession on every membership
change (fully transparent) remains for 0003f. Master behavior unchanged.
The per-process nonce cache breaks anti-replay under multi-node failover
(audit 0004): a request captured on one node can be replayed to a
DIFFERENT node whose local cache never saw the nonce, and is accepted.
This makes the nonce state shared so a replay is rejected cluster-wide.
pkg/membership:
- nonceStore is now an interface. The in-memory cache is renamed
memNonceCache (still the default, single-node behavior).
- kvNonceStore (new) claims each nonce with an atomic KV Create on a
shared bucket: first sight wins (accept), any later sight on any node
rejects (replay). A backend error fails CLOSED (reject), so a KV outage
never silently disables anti-replay. The bucket carries a TTL =
nonceTTL (2*clockSkew) so a key expires exactly when its replay window
closes; raw base64 nonces are mapped to KV-safe keys via sha256-hex.
- Server.UseReplicatedNonces(js, replicas) swaps the store on a node;
every node in a cluster calls it. NewServer still defaults to the
in-memory cache (master behavior unchanged).
Test (DoD error path — the issue's cross-node replay case):
- TestReplicatedNonceRejectsCrossNodeReplay: two membershipd nodes share
one KV bucket; a request accepted (200) on node A, replayed with the
same ts+nonce to node B, is rejected (401) — and replaying to A again
is rejected too.
Branch-by-abstraction for the blob store (issue 0003d): media ciphertext
can live in a replicated JetStream Object Store instead of local disk, so
a blob uploaded to one node survives a node loss and is reachable from
any node.
pkg/blobstore:
- Store is now an interface (Put/Get/Has). The filesystem backend is
renamed diskStore and stays the default: New(dir) returns it.
- objectStore (new) implements Store over a NATS Object Store bucket with
a configurable replication factor (R1..R5), matching the KV store's
R1->R3 rollout. Content-addressing (sha256-hex) is identical, so the
wire contract is unchanged.
pkg/membership:
- Server.blobs and NewServer take the blobstore.Store interface instead
of the concrete type; no behavior change with the disk default.
Tests (DoD: golden + edge + contract):
- TestObjectStoreRoundTrip: put/get/has + content-addressed dedup.
- TestObjectStoreMissing: unknown hash is absent and unreadable.
- TestObjectStoreAddressMatchesDisk: the Object Store and disk backends
address identical bytes to the IDENTICAL hash (portable blob refs).
Like the KV store (0003b), wiring membershipd to select the Object Store
is deferred to the decentralized boot path (flag off); disk stays default.
The one-time data move decentralization needs (issue 0003c): copy the
entire control-plane state from the local SQLite database into the
replicated JetStream KV buckets, with a backup taken first.
pkg/membership:
- Snapshot / SealedKeyRecord: a backend-agnostic dump of the whole
control plane (rooms with their real epoch, members, every sealed-key
row across epochs, users with status).
- (*sqliteStore).ExportSnapshot and (*jetstreamStore).ExportSnapshot read
a full Snapshot from each backend; (*jetstreamStore).importSnapshot
writes one with raw Puts (preserving epoch/status, not resetting to
defaults) so the migration is faithful and idempotent (every write is
an overwrite, so re-running converges).
- MigrateSQLiteToKV orchestrates export -> import; BackupSQLite makes a
consistent copy via SQLite's VACUUM INTO before any migration.
cmd/membershipd:
- `membershipd migrate-to-kv --db <path> --nats-url <url> [--replicas N]
[--ca <cert>] [--no-backup]` backs up the SQLite file, connects to the
cluster's NATS, and migrates. Dispatched on the host like `user`.
Tests (DoD: golden + edge + parity):
- TestMigrateSQLiteToKVParity: seed a representative SQLite (two rooms,
one rekeyed to epoch 2, members, a revoked user); after migration the
KV ExportSnapshot equals the SQLite ExportSnapshot.
- TestMigrateSQLiteToKVIdempotent: running the migration twice yields the
same KV state.
- TestBackupSQLiteCreatesConsistentCopy: the backup reopens with
identical data.
Plus a binary smoke (seed user -> run server -> migrate-to-kv -> re-run):
backup written, 1 user migrated, second run identical.
Branch-by-abstraction for the control-plane store (issue 0003b), so the
membership state can move off process-local SQLite onto replicated
JetStream KV without rewriting callers and without breaking master.
pkg/membership:
- Store is now an interface (rooms/members/keys + user allowlist +
Close). The existing SQLite implementation is renamed sqliteStore and
stays the default: Open(path) still returns it. openSQLite keeps the
concrete type for internal callers (the 0003c migration).
- ErrNotFound is a storage-agnostic "no such record" sentinel; both
backends return it (the SQLite store maps sql.ErrNoRows to it). The
control plane now branches on ErrNotFound instead of sql.ErrNoRows, so
server.go no longer imports database/sql.
- jetstreamStore (new) implements Store over five replicated KV buckets:
rooms, members, rooms_by_member (reverse index for ListRoomsForEndpoint),
room_keys, users. Replication factor is configurable (R1..R5) for the
R1->R3 rollout. Every read is bounded by OpTimeout and IsAuthorized /
HasAdmin FAIL CLOSED on any backend error (a KV quorum loss denies,
never admits), per the audit's requirement for the decentralized store.
dev/feature_flags.json:
- Add the `decentralized` flag (OFF): sqliteStore default while off,
jetstreamStore behind it. The membershipd boot wiring that selects the
KV store is deliberately deferred to 0003e/0003f (the embedded-NATS
authenticator<->store bootstrap is part of the session/deploy redesign);
OFF keeps the single-node SQLite control plane unchanged.
Tests (DoD: golden + edges + error path):
- TestJetStreamStoreRoomsCRUD: encrypted room + owner + invited member
round-trip through every room/member/key method, including latest-epoch
resolution and rekey.
- TestJetStreamStoreUsers: add/get/authorize/list/revoke + admin gate,
with case-insensitive key normalization and duplicate rejection.
- TestJetStreamStoreNotFound: ErrNotFound mapping for misses.
- TestJetStreamStoreIsAuthorizedFailClosed: NATS backend shut down ->
IsAuthorized and HasAdmin both DENY within the bounded timeout.
The full existing suite stays green: sqliteStore is unchanged behavior.
TestAudit_OwnerSpoof: a body declaring a foreign owner endpoint or signing key
is 403; a self-owned create is 201.
TestAudit_NonceCachePoisonPreAuth: an unregistered identity's repeated nonce
still fails 'not authorized' (never 'replayed'), proving it was not cached, while
an authorized identity's replay is still rejected.
Nonce cache unit tests: prune-after-TTL and cap-bounded memory.
Three medium audit findings.
H6 (owner spoof): handleCreateRoom now binds the body's declared owner to the
authenticated signer — both the endpoint id and the signing key must be the
signer's — so a registered peer cannot create rooms in another identity's name.
Enforced only when an authenticated signer is present.
H7 (nonce-cache poison pre-auth): IsAuthorized now runs BEFORE the replay cache
is touched, so an unregistered identity (Ed25519 keys are free) can no longer
seed nonces into it. The cache is rewritten with O(expired) pruning (insertion
order equals expiry order under a constant TTL) instead of the previous O(n)
full-map scan under the mutex, plus a size cap with oldest-eviction. This is the
prerequisite the 0003 replicated nonce store builds on.
H12 (error leak): internal store/blob errors are logged and replaced with a
generic client message via writeServerErr, so SQL fragments and filesystem paths
no longer reach the caller. Crafted 4xx messages (owner-sig, validation) are kept.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
pkg/membership TestRequireEncryptedRoomsRejectsCleartext: cleartext create ->
403, encrypted -> 201, flag off -> cleartext allowed again.
pkg/client TestAudit_NoSubjectACL: under the public posture a ModeNATS room is
refused; bob (member) decrypts the secret; eve raw-subscribes to the subject off
the data plane and receives only ciphertext (non-empty AEAD nonce, no plaintext
substring) — closing the auditor's 'eve reads internal: salary numbers'.
Audit H4 (Alto). The embedded NATS has a single account with no per-subject
permissions, so any registered peer can subscribe to any subject — a cleartext
(ModeNATS) room's payload is readable by anyone who knows the subject.
A complete per-subject ACL derived from membership does not fit here: NATS
evaluates a connection's permissions once at connect time and never re-evaluates
them, but unibus clients connect-then-create/join-then-publish on one connection
(TestSecureBusEndToEnd). Static permissions would forbid the owner from
publishing to a room it just created; the dynamic reconnection model belongs to
the 0003 decentralization redesign. See dev/0004d-dataplane-acl.md.
Minimum defense implemented: Server.RequireEncryptedRooms (set by membershipd on
any non-loopback bind) refuses to create cleartext rooms, so every room on a
public deployment is end-to-end encrypted. Message CONTENT stays confidential
even with no subject isolation; residual traffic-metadata exposure is documented
and tracked for 0003.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Audit H3 (Alto). 'Authorized' meant 'registered in the allowlist', not 'member
of the room', so any registered peer could read another room's subject, its
full member list (every member's sign_pub + kex_pub), any endpoint's room
directory, and even another member's sealed key.
The middleware now carries the authenticated signer's endpoint id into the
handler via request context. Room handlers enforce membership:
- GET /rooms/{id} and /rooms/{id}/members require the signer to be a member;
- GET /rooms/{id}/key serves the sealed key only to its own endpoint
(endpoint == signer) and only to a member;
- GET /members/{endpoint}/rooms is restricted to the signer's own endpoint.
Authorization is skipped only when no authenticated signer is present (AuthOff
dev, or a soft-mode pass-through), preserving legacy/dev behavior. Internal
errors no longer echo store messages to the client on these paths.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Ports the auditor's TestAudit_DoSBodyLimitNoAuth: an unsigned oversized POST
to /blobs is now rejected 413 without the resident set spiking (measured via
/proc/self/status, delta bounded to <96 MiB vs the attack's 400 MB+). Covers
both a truthful over-ceiling Content-Length (rejected pre-read) and a chunked
unknown-length sender (MaxBytesReader caps the read). Plus golden (normal blob
stored), boundary (exactly at the ceiling accepted), the 1 MiB control-plane
ceiling, and the per-IP rate limit (flood -> 429, distinct IPs not throttled).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pre-auth DoS hardening (audit H1, Critical). The control-plane middleware
read the request body with io.ReadAll before authenticating and with no size
cap, so an unauthenticated peer could force the server to buffer an arbitrary
body in RAM (the auditor sent 400 MB and watched RSS climb to ~898 MB).
- ServeHTTP now caps the buffered body before reading: a per-route ceiling
(1 MiB JSON, 16 MiB /blobs) rejects an over-declared Content-Length outright
and wraps the body in http.MaxBytesReader so a lying/chunked sender trips at
the ceiling instead of unbounded.
- handlePutBlob maps the MaxBytesReader cutoff to 413 in every auth mode.
- Per-IP token-bucket rate limiter (golang.org/x/time/rate, already in the
module graph) sheds floods before auth or body reads. Loopback dev stacks are
unaffected (burst >> any single client's rate). Kept in-package as transport
glue, not promoted to the registry, mirroring the nonceCache decision in 0003.
- membershipd sets http.Server.MaxHeaderBytes and ReadHeaderTimeout.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
membership/auth_test: golden (signed+registered accepted), error paths
(unregistered 401, replayed nonce 401, clock skew 401, tampered body 401,
missing headers 401), exemptions (healthz, soft allows, off no-op).
client_test: end-to-end with the real client against an enforce server —
registered peer accepted, unregistered rejected, revoked peer denied without
a server restart.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds the bus-auth rollout (off|soft|enforce) to the control plane. The
middleware verifies an Ed25519 request signature over CanonicalRequest
(method, request-URI, ts, nonce, sha256(body)), checks the timestamp is
within +/-30s, rejects replayed nonces via an in-memory TTL cache (60s), and
requires the signer to be an active user in the allowlist. soft logs
rejections but lets requests through so clients can migrate without an
outage; off is the legacy no-op default. /healthz is exempt so health probes
work before any identity exists. CanonicalRequest is exported as the single
source of truth shared with the client.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Golden: add -> get -> IsAuthorized true, admin seeded. Edge: empty role
defaults to member, case-insensitive hex lookup, list ordering, revoke
denies authorization and stamps revoked_at. Error: duplicate key
(ErrUserExists), invalid role, empty sign_pub, unknown user not authorized,
revoke of unknown/already-revoked. Plus users-table migration idempotency.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bus-level user allowlist (issue 0001a): the authoritative directory of
Ed25519 signing identities permitted to use the bus, independent of room
membership. Migration is additive and mirrored byte-for-byte between the
module-root migrations/ and the embedded pkg/membership/migrations/.
Store adds AddUser/GetUser/ListUsers/RevokeUser/IsAuthorized/HasAdmin.
IsAuthorized is the single fail-closed predicate both the control plane and
the NATS data plane will consult, so revocation is a status flip that denies
access on both without a restart.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A peer invited to an encrypted room needs to find it: the control plane is
pull-based (no server push of invitations), so add a discovery endpoint that
lists every room an endpoint belongs to, with the room's metadata and the
endpoint's role.
- store.ListRoomsForEndpoint: JOIN members+rooms, ordered by room id, empty
slice (not error) for an endpoint in no rooms.
- membershipd: GET /members/{endpoint}/rooms returns {room_id, subject, epoch,
policy, role}[].
- client.ListMyRooms + RoomRef: a bot polls this to discover and then Join +
Subscribe rooms it was invited to.
Tests: store-level (owner in N rooms, member in one, unknown endpoint → []) and
client-level e2e through the embedded harness (B discovers a room A invited it
to, without prior knowledge of the room id; owner sees role=owner).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- membership server returns 403 + human-readable message on missing sealed key (was leaking 'sql: no rows in result set')
- client doJSON unwraps the server's {"error"} field instead of pasting the raw HTTP envelope