feat(hub): dashboard + deploy del scraper NATS server-level del cluster unibus
Add the unibus-nats Grafana dashboard and the per-node deploy script for the unibus_exporter NATS-local mode, which surfaces the server-level NATS/JetStream metrics that /healthz cannot (msgs/s, connections, KV bucket msgs, RAFT leader per stream, memory, restarts). - hub/dashboards/unibus-nats.json (uid unibus-nats, datasource victoriametrics): 12 panels — NATS up, connections, msgs/s in, slow consumers, JetStream msgs, nats-server restarts (1h); msgs/s per node (in/out), connections per node, KV bucket msgs per bucket, nats-server memory per node; RAFT leader per stream table, JetStream per node table. Picked up automatically by the existing "fleet" dashboard provider (scans the dashboards path). - hub/deploy_unibus_nats_exporter.sh: deploys unibus_exporter in NATS-local mode on a node. magnus pushes to the local VictoriaMetrics (no auth); homer/datardos push to the public ingest endpoint with basic auth (pass fleet/ingest-pass, never in argv). Installs unibus-exporter-nats.service and probes 8222 to warn if the monitoring endpoint is not yet open. Does NOT restart membershipd or touch the cluster — that is the consolidated 0.11.0 rollout. The scraper code (unibus_exporter NATS-local mode) and the registry parser (parse_nats_monitor_go_infra) ship in their own repos; see report unibus-nats-metrics-clean for the integration order. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Executable
+94
@@ -0,0 +1,94 @@
|
||||
#!/usr/bin/env bash
|
||||
# Despliega unibus_exporter en MODO NATS LOCAL en un nodo del cluster unibus.
|
||||
#
|
||||
# A diferencia del modo healthz (una sola instancia en magnus que sondea los 3
|
||||
# nodos por IP pública), el endpoint de monitoring de NATS (8222) es loopback-only
|
||||
# y sin auth, así que este modo corre EN CADA nodo y lee su propio 127.0.0.1:8222.
|
||||
# Por eso este script se ejecuta una vez por nodo.
|
||||
#
|
||||
# Uso: ./deploy_unibus_nats_exporter.sh <node_name> <ssh_host>
|
||||
# node_name : etiqueta lógica del nodo (magnus | homer | datardos)
|
||||
# ssh_host : alias SSH del nodo (~/.ssh/config). magnus suele ser `om`.
|
||||
#
|
||||
# Requisitos:
|
||||
# - Go instalado localmente (compila el binario amd64).
|
||||
# - `pass fleet/ingest-pass` (password de ingesta, solo para nodos != magnus).
|
||||
# - sudo -n (sin password) en el host remoto.
|
||||
#
|
||||
# PRECONDICIÓN CRÍTICA: el nodo debe correr unibus 0.11.0+ arrancado con
|
||||
# UNIBUS_NATS_MONITOR=1 (drop-in membershipd-cluster.service.d/nats-monitor.conf),
|
||||
# o /varz no responderá y el exporter emitirá nats_up=0. Este script NO reinicia
|
||||
# membershipd ni toca el cluster — eso es el rollout consolidado del binario 0.11.0.
|
||||
set -euo pipefail
|
||||
|
||||
NODE="${1:?uso: deploy_unibus_nats_exporter.sh <node_name> <ssh_host>}"
|
||||
HOST="${2:?uso: deploy_unibus_nats_exporter.sh <node_name> <ssh_host>}"
|
||||
|
||||
ROOT="$(cd "$(dirname "$0")/.." && pwd)" # projects/fleet_monitoring
|
||||
APP="$ROOT/apps/unibus_exporter"
|
||||
|
||||
# magnus ES el hub: empuja a la VM local sin auth. El resto empuja al endpoint
|
||||
# público de ingesta con basic auth (mismo que metrics_agent).
|
||||
if [ "$NODE" = "magnus" ]; then
|
||||
HUB="http://127.0.0.1:8428/api/v1/import/prometheus"
|
||||
USER=""
|
||||
PW=""
|
||||
else
|
||||
HUB="https://metrics-dxaqj3ina6eqd5pjt85wkrrj.organic-machine.com/api/v1/import/prometheus"
|
||||
USER="fleet"
|
||||
PW="$(pass show fleet/ingest-pass | head -1)"
|
||||
fi
|
||||
|
||||
echo ">> compilando unibus_exporter (linux/amd64)"
|
||||
mkdir -p "$APP/dist"
|
||||
( cd "$APP" && CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -ldflags="-s -w" -o dist/unibus_exporter_amd64 . )
|
||||
BIN="$APP/dist/unibus_exporter_amd64"
|
||||
|
||||
echo ">> subiendo binario a $HOST"
|
||||
scp -q -o BatchMode=yes "$BIN" "$HOST:/tmp/unibus_exporter"
|
||||
|
||||
echo ">> instalando servicio NATS-local en $NODE ($HOST)"
|
||||
ssh -o BatchMode=yes "$HOST" "NODE='$NODE' HUB='$HUB' VMUSER='$USER' PW='$PW' bash -s" <<'OUTER'
|
||||
set -e
|
||||
sudo -n mkdir -p /opt/unibus-exporter /etc/unibus-exporter
|
||||
sudo -n mv /tmp/unibus_exporter /opt/unibus-exporter/unibus_exporter
|
||||
sudo -n chmod 755 /opt/unibus-exporter/unibus_exporter
|
||||
sudo -n tee /etc/unibus-exporter/unibus.nats.json >/dev/null <<JSON
|
||||
{
|
||||
"nats_monitor": { "enabled": true, "node": "${NODE}", "base_url": "http://127.0.0.1:8222" },
|
||||
"hub_url": "${HUB}",
|
||||
"user": "${VMUSER}",
|
||||
"pass": "${PW}",
|
||||
"interval_sec": 15,
|
||||
"timeout_sec": 8,
|
||||
"labels": { "job": "unibus_exporter" }
|
||||
}
|
||||
JSON
|
||||
sudo -n chmod 600 /etc/unibus-exporter/unibus.nats.json
|
||||
sudo -n tee /etc/systemd/system/unibus-exporter-nats.service >/dev/null <<'UNIT'
|
||||
[Unit]
|
||||
Description=unibus NATS local exporter (fleet_monitoring)
|
||||
After=network-online.target
|
||||
Wants=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
ExecStart=/opt/unibus-exporter/unibus_exporter -config /etc/unibus-exporter/unibus.nats.json
|
||||
Restart=always
|
||||
RestartSec=10
|
||||
NoNewPrivileges=true
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
UNIT
|
||||
sudo -n systemctl daemon-reload
|
||||
sudo -n systemctl enable unibus-exporter-nats >/dev/null 2>&1
|
||||
sudo -n systemctl restart unibus-exporter-nats
|
||||
sleep 3
|
||||
echo -n "status: "; systemctl is-active unibus-exporter-nats || true
|
||||
echo "== nats_up local =="
|
||||
curl -fsS "http://127.0.0.1:8222/varz" >/dev/null 2>&1 && echo "8222 OK (monitoring abierto)" || echo "8222 NO responde — falta UNIBUS_NATS_MONITOR=1 en membershipd (nats_up=0 hasta el rollout)"
|
||||
OUTER
|
||||
|
||||
echo ">> $NODE: unibus_exporter (modo NATS local) desplegado"
|
||||
echo ">> recuerda: nats_up=0 hasta que el nodo corra unibus 0.11.0+ con UNIBUS_NATS_MONITOR=1"
|
||||
Reference in New Issue
Block a user