b9716a7cd6
Snapshot de WIP acumulado de sesiones previas antes de merge wave 1 del flow 0008 (kanban_cpp + agent_runner_api + DoD schema). Incluye: - dev/flows/0008-kanban-cpp-and-agent-workflows.md - dev/issues/0112-0119*.md (7 sub-issues) - WIP previo en cmd/fn/doctor.go, registry/*, modules/, cpp/, etc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3.8 KiB
3.8 KiB
id, title, status, type, domain, scope, priority, depends, blocks, related, created, updated, tags
| id | title | status | type | domain | scope | priority | depends | blocks | related | created | updated | tags | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0106 | App services_monitor: dashboard cross-PC de services activos | pendiente | app |
|
multi-app | alta |
|
|
2026-05-17 | 2026-05-17 |
|
0106 — App services_monitor
Problema
fn doctor services da snapshot puntual del PC local. Falta vista en vivo cross-PC:
- ¿Cuales de mis 10 services estan vivos en aurgi-pc?
- ¿Cuales en organic-machine.com?
- ¿Cuales murieron sin que me entere (caso sqlite_api 2026-05-17)?
Decision
App ImGui services_monitor consumiendo backend Go services_api (port 8485). Reconcilia esperado (service_targets + apps.* del registry) vs real (systemd state + port listening + HTTP health) en cada PC target. Persistencia historica = transiciones + agregado horario.
Componentes
Backend apps/services_api/ (Go, tag: service, port 8485)
Endpoints:
GET /api/serviceslista plana(app_id, pc_id, expected, actual, port, last_check_ts, last_healthy_ts, transitions_24h)GET /api/services/:app/:pcdetalle + ultimas N transiciones + journalctl tailPOST /api/services/:app/:pc/checkfuerza check inmediatoPOST /api/services/:app/:pc/action(action=start|stop|restart) feature-flag OFF en v1GET /api/pcsestado por PC (reachable, lag_ms, version_uname)GET /api/ws/servicesWS push de delta cada check
Worker pool: ciclo 10s por PC, paralelo.
Checker local (is_local_only=true o PC = self): exec systemctl --user is-active <unit> + ss -tln | grep :<port> + curl -m <timeout> <health_endpoint>.
Checker remoto: ssh_exec_go_infra con los mismos comandos + parseo de output.
BD: services_api.db:
service_checkappend-only (ts, app_id, pc_id, systemd_state, port_listening, http_status, latency_ms)service_transition(ts, app_id, pc_id, from, to)service_state_hourly(hour_bucket, app_id, pc_id, healthy_ratio, transitions)
Frontend apps/services_monitor/ (C++ ImGui)
Patron data_factory. Paneles:
- Overview Grid
pcs x apps. Celda = semaforo. Click => Detail. - PC Detail apps esperadas en el PC, drift expected vs actual, accion restart (disabled v1).
- App Detail por app: estado en cada PC, transitions ultimas 7d, mini chart healthy_ratio horario.
- Live (WS) stream transitions.
- Alerts apps expected=running AND actual=inactive > 5min. (v1 solo lista; notifs separadas).
UI: data_table_cpp_viz, badge_cpp_core, empty_state_cpp_core.
Decisiones cerradas (2026-05-17)
- Local especial: PC local NO se chequea via SSH. Flag
pc_is_selfpor PC. Checker selecciona path: local exec vs ssh exec. - Persistencia: transitions + hourly aggregate. Append-only
service_checkcon TTL 7d (vacuum job nocturno). - Auto-start: NO en v1. Solo alerta. Feature flag
services_monitor.auto_fixOFF.
Tareas (orden)
- Migration
services_api.db: tablaservice_check,service_transition,service_state_hourly - Funciones registry:
port_listening_check_go_infra,http_health_probe_go_infra(si no existen) via fn-constructor paralelo services_apiMVP: worker loop +/api/services+ WS- systemd unit + Restart=always + actualizar issue 0105 con 11mo service
- App C++
services_monitorscaffold viafn run init_cpp_app services_monitor - Panel Overview + WS client
- PC Detail + App Detail
- Alerts panel
DoD
- 10 services visibles en Overview con semaforo correcto contra ground truth.
- Caida simulada (kill -9 sqlite_api) detectada en <15s.
- Recovery (auto-restart via Restart=always) detectada y reflejada en transitions.
- App lanzable en aurgi-pc + home-wsl (sin SSH a self).
- Backend
services_apicorriendo comotag: service(dogfooding completo).