Files
fleet_monitoring/hub/deploy_unibus_nats_exporter.sh
T
Egutierrez dfd55dc10a feat(hub): dashboard + deploy del scraper NATS server-level del cluster unibus
Add the unibus-nats Grafana dashboard and the per-node deploy script for the
unibus_exporter NATS-local mode, which surfaces the server-level NATS/JetStream
metrics that /healthz cannot (msgs/s, connections, KV bucket msgs, RAFT leader
per stream, memory, restarts).

- hub/dashboards/unibus-nats.json (uid unibus-nats, datasource victoriametrics):
  12 panels — NATS up, connections, msgs/s in, slow consumers, JetStream msgs,
  nats-server restarts (1h); msgs/s per node (in/out), connections per node,
  KV bucket msgs per bucket, nats-server memory per node; RAFT leader per stream
  table, JetStream per node table. Picked up automatically by the existing
  "fleet" dashboard provider (scans the dashboards path).
- hub/deploy_unibus_nats_exporter.sh: deploys unibus_exporter in NATS-local mode
  on a node. magnus pushes to the local VictoriaMetrics (no auth); homer/datardos
  push to the public ingest endpoint with basic auth (pass fleet/ingest-pass,
  never in argv). Installs unibus-exporter-nats.service and probes 8222 to warn if
  the monitoring endpoint is not yet open. Does NOT restart membershipd or touch
  the cluster — that is the consolidated 0.11.0 rollout.

The scraper code (unibus_exporter NATS-local mode) and the registry parser
(parse_nats_monitor_go_infra) ship in their own repos; see report
unibus-nats-metrics-clean for the integration order.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 21:19:42 +02:00

95 lines
3.8 KiB
Bash
Executable File

#!/usr/bin/env bash
# Despliega unibus_exporter en MODO NATS LOCAL en un nodo del cluster unibus.
#
# A diferencia del modo healthz (una sola instancia en magnus que sondea los 3
# nodos por IP pública), el endpoint de monitoring de NATS (8222) es loopback-only
# y sin auth, así que este modo corre EN CADA nodo y lee su propio 127.0.0.1:8222.
# Por eso este script se ejecuta una vez por nodo.
#
# Uso: ./deploy_unibus_nats_exporter.sh <node_name> <ssh_host>
# node_name : etiqueta lógica del nodo (magnus | homer | datardos)
# ssh_host : alias SSH del nodo (~/.ssh/config). magnus suele ser `om`.
#
# Requisitos:
# - Go instalado localmente (compila el binario amd64).
# - `pass fleet/ingest-pass` (password de ingesta, solo para nodos != magnus).
# - sudo -n (sin password) en el host remoto.
#
# PRECONDICIÓN CRÍTICA: el nodo debe correr unibus 0.11.0+ arrancado con
# UNIBUS_NATS_MONITOR=1 (drop-in membershipd-cluster.service.d/nats-monitor.conf),
# o /varz no responderá y el exporter emitirá nats_up=0. Este script NO reinicia
# membershipd ni toca el cluster — eso es el rollout consolidado del binario 0.11.0.
set -euo pipefail
NODE="${1:?uso: deploy_unibus_nats_exporter.sh <node_name> <ssh_host>}"
HOST="${2:?uso: deploy_unibus_nats_exporter.sh <node_name> <ssh_host>}"
ROOT="$(cd "$(dirname "$0")/.." && pwd)" # projects/fleet_monitoring
APP="$ROOT/apps/unibus_exporter"
# magnus ES el hub: empuja a la VM local sin auth. El resto empuja al endpoint
# público de ingesta con basic auth (mismo que metrics_agent).
if [ "$NODE" = "magnus" ]; then
HUB="http://127.0.0.1:8428/api/v1/import/prometheus"
USER=""
PW=""
else
HUB="https://metrics-dxaqj3ina6eqd5pjt85wkrrj.organic-machine.com/api/v1/import/prometheus"
USER="fleet"
PW="$(pass show fleet/ingest-pass | head -1)"
fi
echo ">> compilando unibus_exporter (linux/amd64)"
mkdir -p "$APP/dist"
( cd "$APP" && CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -ldflags="-s -w" -o dist/unibus_exporter_amd64 . )
BIN="$APP/dist/unibus_exporter_amd64"
echo ">> subiendo binario a $HOST"
scp -q -o BatchMode=yes "$BIN" "$HOST:/tmp/unibus_exporter"
echo ">> instalando servicio NATS-local en $NODE ($HOST)"
ssh -o BatchMode=yes "$HOST" "NODE='$NODE' HUB='$HUB' VMUSER='$USER' PW='$PW' bash -s" <<'OUTER'
set -e
sudo -n mkdir -p /opt/unibus-exporter /etc/unibus-exporter
sudo -n mv /tmp/unibus_exporter /opt/unibus-exporter/unibus_exporter
sudo -n chmod 755 /opt/unibus-exporter/unibus_exporter
sudo -n tee /etc/unibus-exporter/unibus.nats.json >/dev/null <<JSON
{
"nats_monitor": { "enabled": true, "node": "${NODE}", "base_url": "http://127.0.0.1:8222" },
"hub_url": "${HUB}",
"user": "${VMUSER}",
"pass": "${PW}",
"interval_sec": 15,
"timeout_sec": 8,
"labels": { "job": "unibus_exporter" }
}
JSON
sudo -n chmod 600 /etc/unibus-exporter/unibus.nats.json
sudo -n tee /etc/systemd/system/unibus-exporter-nats.service >/dev/null <<'UNIT'
[Unit]
Description=unibus NATS local exporter (fleet_monitoring)
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
ExecStart=/opt/unibus-exporter/unibus_exporter -config /etc/unibus-exporter/unibus.nats.json
Restart=always
RestartSec=10
NoNewPrivileges=true
[Install]
WantedBy=multi-user.target
UNIT
sudo -n systemctl daemon-reload
sudo -n systemctl enable unibus-exporter-nats >/dev/null 2>&1
sudo -n systemctl restart unibus-exporter-nats
sleep 3
echo -n "status: "; systemctl is-active unibus-exporter-nats || true
echo "== nats_up local =="
curl -fsS "http://127.0.0.1:8222/varz" >/dev/null 2>&1 && echo "8222 OK (monitoring abierto)" || echo "8222 NO responde — falta UNIBUS_NATS_MONITOR=1 en membershipd (nats_up=0 hasta el rollout)"
OUTER
echo ">> $NODE: unibus_exporter (modo NATS local) desplegado"
echo ">> recuerda: nats_up=0 hasta que el nodo corra unibus 0.11.0+ con UNIBUS_NATS_MONITOR=1"