feat(embeddednats): UNIBUS_NATS_MONITOR flag decoupled from debug log
Add a dedicated UNIBUS_NATS_MONITOR=1 toggle that opens the embedded nats-server monitoring HTTP endpoint (127.0.0.1:8222, loopback only) so a local metrics scraper can read /varz, /connz and /jsz for server-level metrics (msgs/s, connections, KV bucket msgs, RAFT leader per stream, restarts). Previously the monitoring endpoint was only reachable via UNIBUS_NATS_DEBUG=1, which is coupled to the verbose nats-server debug log: enabling the endpoint also wrote routes/RAFT/room subjects to journald in clear, which regresses the hardened posture (issue 0007). The two concerns are now decoupled. The toggle computation is extracted to a pure function natsLogOpts(debugEnv, monitorEnv) (noLog, debug, trace, monitor): MONITOR=1 opens the endpoint while keeping the log quiet (NoLog true / Debug false). The inverse coupling is preserved for backward compatibility (DEBUG still implies MONITOR). The 127.0.0.1 bind stays hardcoded — the monitoring endpoint has no auth and must never be reachable from the network. Deploy wiring versioned: additive systemd drop-in membershipd-cluster.service.d/nats-monitor.conf (Environment=UNIBUS_NATS_MONITOR=1) plus a "NATS server metrics" section in the cluster README with the rolling activation runbook (magnus -> homer -> datardos) gated on R3 reconvergence (followers 2/2) between nodes. Tests: pure decoupling table (monitor on => log NOT debug; debug => monitor; default closed) + a real embedded server with MONITOR=1 asserting /varz answers 200 on loopback:8222, and a server without the flag with the endpoint closed. 100% additive: behavior is identical without the flag. Bump app.md 0.10.0 -> 0.11.0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -103,17 +103,38 @@ func StartHostAuth(storeDir, host string, port int, auth server.Authentication)
|
||||
return StartServer(ServerConfig{StoreDir: storeDir, Host: host, Port: port, Auth: auth})
|
||||
}
|
||||
|
||||
// natsLogOpts maps the two independent environment toggles to the embedded
|
||||
// nats-server logging and monitoring flags. It is a pure function (no I/O) so the
|
||||
// decoupling between the two toggles can be unit-tested directly.
|
||||
//
|
||||
// - UNIBUS_NATS_DEBUG="1" enables the nats-server logger (route/RAFT/JetStream
|
||||
// errors); "2" additionally enables protocol tracing. Off by default so the
|
||||
// server stays silent (NoLog) and production behavior is unchanged.
|
||||
// - UNIBUS_NATS_MONITOR="1" opens the monitoring HTTP endpoint (loopback only)
|
||||
// for a local metrics scraper to read /varz, /connz and /jsz.
|
||||
//
|
||||
// The two are DECOUPLED on purpose: enabling the monitoring endpoint must NOT turn
|
||||
// on the verbose debug log, which would write room subjects and routing metadata
|
||||
// to journald in clear and regress the hardened posture (issue 0007). The reverse
|
||||
// coupling is kept for backward compatibility: debug mode still exposes the
|
||||
// monitoring endpoint as well (debug implies monitor), so existing debugging
|
||||
// workflows are unchanged.
|
||||
func natsLogOpts(debugEnv, monitorEnv string) (noLog, debug, trace, monitor bool) {
|
||||
debug = debugEnv == "1" || debugEnv == "2"
|
||||
trace = debugEnv == "2"
|
||||
monitor = monitorEnv == "1" || debug
|
||||
noLog = !debug
|
||||
return noLog, debug, trace, monitor
|
||||
}
|
||||
|
||||
// StartServer launches an embedded nats-server with JetStream from cfg. It
|
||||
// blocks until the server is ready to accept connections (up to 5s) and returns
|
||||
// the running server; the caller must Shutdown it.
|
||||
func StartServer(cfg ServerConfig) (*server.Server, error) {
|
||||
// Diagnostic toggle: UNIBUS_NATS_DEBUG=1 enables the embedded nats-server's own
|
||||
// logger (route/RAFT/JetStream errors), which is otherwise silenced. Off by
|
||||
// default so production behavior is unchanged; only set it when debugging the
|
||||
// cluster route layer.
|
||||
debugLevel := os.Getenv("UNIBUS_NATS_DEBUG")
|
||||
debugNATS := debugLevel == "1" || debugLevel == "2"
|
||||
traceNATS := debugLevel == "2"
|
||||
// Map the two independent env toggles to the nats-server logging + monitoring
|
||||
// flags. See natsLogOpts for the decoupling rationale (issue 0007).
|
||||
noLog, debugNATS, traceNATS, monitorNATS := natsLogOpts(
|
||||
os.Getenv("UNIBUS_NATS_DEBUG"), os.Getenv("UNIBUS_NATS_MONITOR"))
|
||||
opts := &server.Options{
|
||||
JetStream: true,
|
||||
StoreDir: cfg.StoreDir,
|
||||
@@ -122,15 +143,17 @@ func StartServer(cfg ServerConfig) (*server.Server, error) {
|
||||
ServerName: cfg.ServerName,
|
||||
DontListen: false,
|
||||
// Keep the embedded server quiet by default; the host app logs the URLs.
|
||||
NoLog: !debugNATS,
|
||||
NoLog: noLog,
|
||||
Debug: debugNATS,
|
||||
Trace: traceNATS,
|
||||
Logtime: true,
|
||||
NoSigs: true,
|
||||
}
|
||||
if debugNATS {
|
||||
// Expose the nats-server monitoring endpoint (loopback) so the operator can
|
||||
// inspect /jsz, /routez, /varz while debugging the cluster meta-group.
|
||||
if monitorNATS {
|
||||
// Expose the nats-server monitoring endpoint on LOOPBACK ONLY (never public):
|
||||
// the operator (or a local metrics scraper) inspects /varz, /connz, /jsz,
|
||||
// /routez. The 127.0.0.1 bind is mandatory because this endpoint has no auth;
|
||||
// it must stay unreachable from the network.
|
||||
opts.HTTPHost = "127.0.0.1"
|
||||
opts.HTTPPort = 8222
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user