feat(embeddednats): UNIBUS_NATS_MONITOR flag decoupled from debug log

Add a dedicated UNIBUS_NATS_MONITOR=1 toggle that opens the embedded
nats-server monitoring HTTP endpoint (127.0.0.1:8222, loopback only) so a
local metrics scraper can read /varz, /connz and /jsz for server-level
metrics (msgs/s, connections, KV bucket msgs, RAFT leader per stream,
restarts).

Previously the monitoring endpoint was only reachable via UNIBUS_NATS_DEBUG=1,
which is coupled to the verbose nats-server debug log: enabling the endpoint
also wrote routes/RAFT/room subjects to journald in clear, which regresses the
hardened posture (issue 0007). The two concerns are now decoupled.

The toggle computation is extracted to a pure function
natsLogOpts(debugEnv, monitorEnv) (noLog, debug, trace, monitor): MONITOR=1
opens the endpoint while keeping the log quiet (NoLog true / Debug false). The
inverse coupling is preserved for backward compatibility (DEBUG still implies
MONITOR). The 127.0.0.1 bind stays hardcoded — the monitoring endpoint has no
auth and must never be reachable from the network.

Deploy wiring versioned: additive systemd drop-in
membershipd-cluster.service.d/nats-monitor.conf (Environment=UNIBUS_NATS_MONITOR=1)
plus a "NATS server metrics" section in the cluster README with the rolling
activation runbook (magnus -> homer -> datardos) gated on R3 reconvergence
(followers 2/2) between nodes.

Tests: pure decoupling table (monitor on => log NOT debug; debug => monitor;
default closed) + a real embedded server with MONITOR=1 asserting /varz answers
200 on loopback:8222, and a server without the flag with the endpoint closed.
100% additive: behavior is identical without the flag. Bump app.md 0.10.0 ->
0.11.0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Egutierrez
2026-06-07 20:57:46 +02:00
parent b4f3118e85
commit 1c9325104c
5 changed files with 274 additions and 12 deletions
+21 -1
View File
@@ -2,7 +2,7 @@
name: unibus
lang: go
domain: infra
version: 0.10.0
version: 0.11.0
description: "Bus de mensajería unificado sobre NATS+JetStream con cifrado E2E por room (megolm/olm reducido): service de membresía/claves, librería cliente y peers demo."
tags: [service, messaging, nats, e2e]
uses_functions:
@@ -169,6 +169,26 @@ agent.<nombre>.{in,out} inbox/outbox de agente LLM (agent.scout.in)
## Capability growth log
- v0.11.0 (2026-06-07) — flag dedicado `UNIBUS_NATS_MONITOR` que abre el endpoint
de monitoring HTTP del nats-server embebido (`127.0.0.1:8222`, loopback only) de
forma DESACOPLADA del debug-log. Antes el monitoring solo se abría con
`UNIBUS_NATS_DEBUG=1`, que además encendía el log verboso del nats-server
(rutas/RAFT/subjects a journald en claro) — incompatible con el endurecimiento
del issue 0007. El cómputo de los toggles se extrae a una función pura
`natsLogOpts(debugEnv, monitorEnv) (noLog, debug, trace, monitor)`: `MONITOR=1`
abre el endpoint dejando el log en silencio (`NoLog` true / `Debug` false), y se
mantiene el acoplamiento inverso por compatibilidad (`DEBUG` sigue implicando
`MONITOR`). El bind loopback `127.0.0.1` queda hardcoded — el monitoring NUNCA es
público y no lleva auth; lo lee un scraper local que empuja a VictoriaMetrics
(dashboard `unibus-nats` en `fleet_monitoring`). Se versiona el cableado de
deploy: drop-in systemd aditivo `membershipd-cluster.service.d/nats-monitor.conf`
(`Environment=UNIBUS_NATS_MONITOR=1`) + sección "NATS server metrics" en el
README del cluster con el runbook de activación rolling (magnus→homer→datardos)
y gate de reconvergencia R3 (`followers 2/2`) entre nodos. Tests nuevos: tabla
pura del desacoplamiento (monitor on ⇒ log NO debug; debug ⇒ monitor; default
cerrado) + server real con `MONITOR=1` que confirma `/varz` 200 en loopback:8222
y server sin flag con el endpoint cerrado. Cambios 100% aditivos: sin el flag el
comportamiento es idéntico; build/test verdes.
- v0.10.0 (2026-06-07) — API HTTP admin-only de gestión de usuarios, cerrando la
última asimetría del control plane: las rooms tenían superficie HTTP firmada
(`POST /rooms`, etc.) pero los users solo se gestionaban por CLI local o acceso